Conversation
📝 WalkthroughWalkthroughThe pull request refactors the network map's peer-addition handling to support batch processing instead of single-peer operations. It introduces a background queue mechanism in NetworkMapBuilder with explicit locking, replaces atomic pointers with direct references, and updates the controller to enqueue multiple peers together, while also adding serial guards to prevent account overwrites. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In @management/server/types/holder.go:
- Around line 28-31: The current guard may call Network.CurrentSerial() on a nil
Network; update the conditional around h.accounts[account.Id] so you only call
CurrentSerial when both sides have non-nil Network pointers. Concretely, use a
nil-safe check like: retrieve a := h.accounts[account.Id], then if a != nil &&
a.Network != nil && account.Network != nil && a.Network.CurrentSerial() >=
account.Network.CurrentSerial() { return }; this prevents nil pointer
dereferences while preserving the original early-return logic.
In @management/server/types/networkmapbuilder.go:
- Around line 1154-1227: The method addPeersIncrementally interleaves b.apb.mu
and b.cache.mu locks, causing risky lock ordering and repeated b.apb.mu
re-locks; fix by removing in-loop b.apb.mu acquisitions: snapshot b.apb.ids and
b.apb.retryCount into local variables before unlocking b.apb.mu, operate on
those locals while holding b.cache.mu (calling updateIndexesForNewPeer,
buildPeerACLView, collectDeltasForNewPeer, etc.), and only re-acquire b.apb.mu
once at the end to reconcile changes (update/delete entries in b.apb.retryCount,
clear/append b.apb.ids and call b.apb.sg.Signal if needed); ensure helper uses
like enqueuePeersForIncrementalAdd are changed to enqueue to a local list and
invoked under apb.mu at the end so you maintain a single consistent lock
ordering (always acquire b.apb.mu before b.cache.mu when both are needed).
- Around line 126-127: The goroutine started by "go builder.incAddPeerLoop()"
leaks because incAddPeerLoop runs forever; add shutdown support by giving
NetworkMapBuilder a cancellable context or done channel and a sync.WaitGroup:
add fields (ctx context.Context, cancel context.CancelFunc, wg sync.WaitGroup)
or (done chan struct{}, wg sync.WaitGroup), modify the constructor that returns
*NetworkMapBuilder to create the context/cancel (or done channel), increment wg
before starting incAddPeerLoop and have incAddPeerLoop select on ctx.Done()/done
to exit its loop, and add a Close/Stop method on NetworkMapBuilder that calls
cancel()/closes done and waits for wg to finish so the goroutine is cleaned up.
🧹 Nitpick comments (3)
management/server/types/networkmap.go (1)
42-47: Inconsistent error handling between single and batch peer addition.
OnPeerAddedUpdNetworkMapCachereturns anerrorwhileOnPeersAddedUpdNetworkMapCachereturns nothing. The batch variant callsEnqueuePeersForIncrementalAddwhich processes asynchronously, but callers have no way to know if enqueuing succeeded or if errors occurred during processing.Consider either:
- Documenting that errors in batch processing are logged but not propagated
- Adding a callback or channel mechanism for error notification
management/server/types/networkmap_golden_test.go (1)
1128-1130: Usingtime.Sleepfor synchronization can cause flaky tests.The test relies on a fixed 100ms sleep to wait for asynchronous batch processing. Under high system load or in CI environments, this may not be sufficient, leading to intermittent test failures.
Consider using a synchronization mechanism such as a
sync.WaitGroup, a done channel, or polling with a timeout to wait for the batch processing to complete.management/server/types/networkmapbuilder.go (1)
29-32: Consider making batch size and retry limits configurable.The constants
szAddPeerBatch = 10andmaxPeerAddRetries = 20are hardcoded. For different deployment sizes, these may need tuning. Consider making them configurable via environment variables or builder options.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
management/internals/controllers/network_map/controller/controller.gomanagement/server/types/holder.gomanagement/server/types/networkmap.gomanagement/server/types/networkmap_golden_test.gomanagement/server/types/networkmapbuilder.gomanagement/server/user.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
management/server/types/networkmap.gomanagement/server/types/networkmap_golden_test.gomanagement/server/user.gomanagement/server/types/networkmapbuilder.gomanagement/internals/controllers/network_map/controller/controller.go
🧬 Code graph analysis (4)
management/server/types/networkmap.go (2)
management/server/types/networkmapbuilder.go (1)
NetworkMapCache(34-60)management/server/types/account.go (1)
Account(74-109)
management/server/user.go (1)
management/server/store/store.go (1)
Store(50-208)
management/server/types/networkmapbuilder.go (4)
management/server/types/account.go (1)
Account(74-109)management/server/types/firewall_rule.go (2)
FirewallRuleDirectionIN(17-17)FirewallRuleDirectionOUT(18-18)route/route.go (1)
ID(48-48)client/internal/acl/id/id.go (1)
RuleID(13-13)
management/internals/controllers/network_map/controller/controller.go (3)
route/route.go (1)
ID(48-48)management/internals/controllers/network_map/interface.go (1)
Controller(23-39)management/server/types/account.go (1)
Account(74-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Agent
- GitHub Check: CodeQL analysis (go)
- GitHub Check: Linux
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
🔇 Additional comments (9)
management/server/user.go (1)
997-1001: LGTM!The conditional serial increment before
OnPeersUpdatedis correct. It ensures the network serial is only incremented when peers are actually being expired, preventing unnecessary serial bumps.management/server/types/networkmap_golden_test.go (1)
269-270: LGTM!The test updates correctly reflect the new API signature that passes the account as the first argument to
OnPeerAddedIncremental.management/internals/controllers/network_map/controller/controller.go (3)
719-728: LGTM!The batch processing logic is well-structured:
- Fetches the account once outside the loop
- Logs peer IDs being added for debugging
- Passes all peer IDs to
onPeersAddedUpdNetworkMapCachein a single callThis reduces redundant account fetches and aligns with the PR's concurrency fix objectives.
489-492: LGTM!The method signature change from single
peerIdto variadicpeerIds ...stringcorrectly supports batch peer processing while maintaining backward compatibility with the existing call pattern.
450-452: LGTM!Extracting
resourcePoliciesandroutersto local variables before passing toGetPeerNetworkMapimproves readability and ensures consistent values are used if these methods had any side effects (even though they likely don't).management/server/types/networkmapbuilder.go (4)
948-954: LGTM!The
updateAccountLockedhelper correctly updates the builder's account only when the new account has a higher serial number, preventing stale data overwrites. The comment indicates the caller must hold the lock.
1688-1736: LGTM!The
mergeFrommethod onPeerUpdateDeltacorrectly merges all delta fields while avoiding duplicates usingslices.Containschecks. This supports the batch processing where multiple peer additions can affect the same target peer.
2042-2050: Improved deletion handling with explicit ACL rebuilds.The refactored deletion flow now explicitly tracks peers that need ACL rebuilds (
peersToRebuildACL) and rebuilds them after applying deletion updates. This is cleaner than the previous approach and ensures consistent state.
1175-1193: Lock ordering is consistent and safe—no deadlock or processing conflict exists.The code maintains a consistent lock ordering throughout:
apb.muis always released beforecache.muis acquired (lines 1163), and when both locks are needed,cache.muis held first followed by temporaryapb.muacquisitions. The pattern inincAddPeerLoop()→addPeersIncrementally()→enqueuePeersForIncrementalAdd()is safe: the queue is cloned and cleared before releasingapb.muand acquiringcache.mu, so retry re-enqueueing adds peers to a fresh queue for the next iteration rather than causing duplicate processing in the current iteration. This retry behavior is intentional, not a defect.
There was a problem hiding this comment.
Pull request overview
This pull request addresses concurrency issues in the network map handling system by refactoring the NetworkMapBuilder to handle peer additions through a batched asynchronous queue mechanism. The key changes involve replacing atomic pointers with direct account references, introducing a background goroutine for batched peer processing, and updating method signatures to accept account parameters directly.
- Introduces a batched peer addition queue with retry logic processed by a background goroutine
- Replaces atomic pointer usage for account storage with direct pointer management and serial-based updates
- Updates
OnPeerAddedIncrementalandOnPeerDeletedmethods to accept account parameters for better concurrency control
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| management/server/types/networkmapbuilder.go | Core changes introducing batched peer addition queue with background goroutine, refactoring account pointer management from atomic to direct references, and implementing delta merging for batch operations |
| management/server/types/networkmap.go | Updates method signatures to pass account references directly and adds new batch enqueueing method |
| management/server/types/networkmap_golden_test.go | Updates test calls to match new API signatures and adds new test for batched peer addition |
| management/server/types/holder.go | Adds serial number comparison to prevent stale account overwrites |
| management/internals/controllers/network_map/controller/controller.go | Updates to use new batch peer addition API and removes UpdateAccountPointer call |
| management/server/user.go | Adds network serial incrementing when peers are expired/updated |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.




Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Release Notes
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.