Skip to content

Nmap/fix concurrency#5040

Merged
crn4 merged 23 commits intomainfrom
nmap/fix-concurrency
Jan 6, 2026
Merged

Nmap/fix concurrency#5040
crn4 merged 23 commits intomainfrom
nmap/fix-concurrency

Conversation

@crn4
Copy link
Copy Markdown
Contributor

@crn4 crn4 commented Jan 6, 2026

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed account protection to prevent overwrites from outdated peer updates
  • Improvements

    • Implemented batch processing for peer updates to enhance handling efficiency
    • Optimized incremental network map updates with refined resource management and account synchronization

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 6, 2026 11:41
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

The pull request refactors the network map's peer-addition handling to support batch processing instead of single-peer operations. It introduces a background queue mechanism in NetworkMapBuilder with explicit locking, replaces atomic pointers with direct references, and updates the controller to enqueue multiple peers together, while also adding serial guards to prevent account overwrites.

Changes

Cohort / File(s) Summary
Controller Batching
management/internals/controllers/network_map/controller/controller.go
Replaced single-peer onPeerAddedUpdNetworkMapCache with variadic onPeersAddedUpdNetworkMapCache for batch processing; prefetch resource maps locally before passing to GetPeerNetworkMap; removed UpdateAccountPointer call in enrichAccountFromHolder.
Account Serial Guard
management/server/types/holder.go
Added guard in AddAccount to prevent overwriting existing account if new serial is less than or equal to current serial.
NetworkMap Public API
management/server/types/networkmap.go
Added new public method Account.OnPeersAddedUpdNetworkMapCache(peerIds ...string); updated OnPeerAddedUpdNetworkMapCache and OnPeerDeletedUpdNetworkMapCache to pass Account instance as first argument to NetworkMapBuilder methods.
NetworkMapBuilder Concurrency & Batching
management/server/types/networkmapbuilder.go
Replaced atomic Pointer[Account] with plain *Account and explicit locking; introduced addPeerBatch type and incAddPeerLoop background worker for incremental peer additions; expanded PeerUpdateDelta to support multiple connected peers; updated OnPeerAddedIncremental and OnPeerDeleted to accept Account parameter; added EnqueuePeersForIncrementalAdd and related helpers for batch enqueuing.
Test Updates
management/server/types/networkmap_golden_test.go
Updated all OnPeerAddedIncremental and OnPeerDeleted calls to pass account context; added new test variant TestGetPeerNetworkMap_Golden_New_WithOnPeerAddedRouter_Batched for batched router additions.
User Peer Update Logic
management/server/user.go
Added conditional IncrementNetworkSerial call when peerIDs list is non-empty, executed before OnPeersUpdated notification.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • pascal-fischer
  • pappz

Poem

🐰 Batch by batch, peers align,
No more locks that twist and twine,
Locks now guard what once was free,
Batched arrivals—harmony! ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning The pull request description is completely unfilled—it contains only the template structure with no substantive information about the changes, issue ticket, or rationale. Fill in the 'Describe your changes' section with a summary of the concurrency fixes and nmap batch processing improvements, provide the issue ticket number/link, select appropriate checklist items (likely 'Is a refactor' and 'Is a feature enhancement'), and explain why documentation is not needed.
Title check ❓ Inconclusive The title 'Nmap/fix concurrency' is vague and lacks specificity; it does not clearly convey the primary changes or intent beyond a general reference to concurrency improvements in the network map system. Consider a more descriptive title such as 'Batch peer network map updates with concurrent safety' or 'Implement batched incremental peer additions with concurrency fixes' to clearly indicate the main architectural changes.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jan 6, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
5 New issues
5 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI Agents
In @management/server/types/holder.go:
- Around line 28-31: The current guard may call Network.CurrentSerial() on a nil
Network; update the conditional around h.accounts[account.Id] so you only call
CurrentSerial when both sides have non-nil Network pointers. Concretely, use a
nil-safe check like: retrieve a := h.accounts[account.Id], then if a != nil &&
a.Network != nil && account.Network != nil && a.Network.CurrentSerial() >=
account.Network.CurrentSerial() { return }; this prevents nil pointer
dereferences while preserving the original early-return logic.

In @management/server/types/networkmapbuilder.go:
- Around line 1154-1227: The method addPeersIncrementally interleaves b.apb.mu
and b.cache.mu locks, causing risky lock ordering and repeated b.apb.mu
re-locks; fix by removing in-loop b.apb.mu acquisitions: snapshot b.apb.ids and
b.apb.retryCount into local variables before unlocking b.apb.mu, operate on
those locals while holding b.cache.mu (calling updateIndexesForNewPeer,
buildPeerACLView, collectDeltasForNewPeer, etc.), and only re-acquire b.apb.mu
once at the end to reconcile changes (update/delete entries in b.apb.retryCount,
clear/append b.apb.ids and call b.apb.sg.Signal if needed); ensure helper uses
like enqueuePeersForIncrementalAdd are changed to enqueue to a local list and
invoked under apb.mu at the end so you maintain a single consistent lock
ordering (always acquire b.apb.mu before b.cache.mu when both are needed).
- Around line 126-127: The goroutine started by "go builder.incAddPeerLoop()"
leaks because incAddPeerLoop runs forever; add shutdown support by giving
NetworkMapBuilder a cancellable context or done channel and a sync.WaitGroup:
add fields (ctx context.Context, cancel context.CancelFunc, wg sync.WaitGroup)
or (done chan struct{}, wg sync.WaitGroup), modify the constructor that returns
*NetworkMapBuilder to create the context/cancel (or done channel), increment wg
before starting incAddPeerLoop and have incAddPeerLoop select on ctx.Done()/done
to exit its loop, and add a Close/Stop method on NetworkMapBuilder that calls
cancel()/closes done and waits for wg to finish so the goroutine is cleaned up.
🧹 Nitpick comments (3)
management/server/types/networkmap.go (1)

42-47: Inconsistent error handling between single and batch peer addition.

OnPeerAddedUpdNetworkMapCache returns an error while OnPeersAddedUpdNetworkMapCache returns nothing. The batch variant calls EnqueuePeersForIncrementalAdd which processes asynchronously, but callers have no way to know if enqueuing succeeded or if errors occurred during processing.

Consider either:

  1. Documenting that errors in batch processing are logged but not propagated
  2. Adding a callback or channel mechanism for error notification
management/server/types/networkmap_golden_test.go (1)

1128-1130: Using time.Sleep for synchronization can cause flaky tests.

The test relies on a fixed 100ms sleep to wait for asynchronous batch processing. Under high system load or in CI environments, this may not be sufficient, leading to intermittent test failures.

Consider using a synchronization mechanism such as a sync.WaitGroup, a done channel, or polling with a timeout to wait for the batch processing to complete.

management/server/types/networkmapbuilder.go (1)

29-32: Consider making batch size and retry limits configurable.

The constants szAddPeerBatch = 10 and maxPeerAddRetries = 20 are hardcoded. For different deployment sizes, these may need tuning. Consider making them configurable via environment variables or builder options.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd578d and 3482614.

📒 Files selected for processing (6)
  • management/internals/controllers/network_map/controller/controller.go
  • management/server/types/holder.go
  • management/server/types/networkmap.go
  • management/server/types/networkmap_golden_test.go
  • management/server/types/networkmapbuilder.go
  • management/server/user.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • management/server/types/networkmap.go
  • management/server/types/networkmap_golden_test.go
  • management/server/user.go
  • management/server/types/networkmapbuilder.go
  • management/internals/controllers/network_map/controller/controller.go
🧬 Code graph analysis (4)
management/server/types/networkmap.go (2)
management/server/types/networkmapbuilder.go (1)
  • NetworkMapCache (34-60)
management/server/types/account.go (1)
  • Account (74-109)
management/server/user.go (1)
management/server/store/store.go (1)
  • Store (50-208)
management/server/types/networkmapbuilder.go (4)
management/server/types/account.go (1)
  • Account (74-109)
management/server/types/firewall_rule.go (2)
  • FirewallRuleDirectionIN (17-17)
  • FirewallRuleDirectionOUT (18-18)
route/route.go (1)
  • ID (48-48)
client/internal/acl/id/id.go (1)
  • RuleID (13-13)
management/internals/controllers/network_map/controller/controller.go (3)
route/route.go (1)
  • ID (48-48)
management/internals/controllers/network_map/interface.go (1)
  • Controller (23-39)
management/server/types/account.go (1)
  • Account (74-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (go)
  • GitHub Check: Linux
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Android / Build
  • GitHub Check: iOS / Build
🔇 Additional comments (9)
management/server/user.go (1)

997-1001: LGTM!

The conditional serial increment before OnPeersUpdated is correct. It ensures the network serial is only incremented when peers are actually being expired, preventing unnecessary serial bumps.

management/server/types/networkmap_golden_test.go (1)

269-270: LGTM!

The test updates correctly reflect the new API signature that passes the account as the first argument to OnPeerAddedIncremental.

management/internals/controllers/network_map/controller/controller.go (3)

719-728: LGTM!

The batch processing logic is well-structured:

  1. Fetches the account once outside the loop
  2. Logs peer IDs being added for debugging
  3. Passes all peer IDs to onPeersAddedUpdNetworkMapCache in a single call

This reduces redundant account fetches and aligns with the PR's concurrency fix objectives.


489-492: LGTM!

The method signature change from single peerId to variadic peerIds ...string correctly supports batch peer processing while maintaining backward compatibility with the existing call pattern.


450-452: LGTM!

Extracting resourcePolicies and routers to local variables before passing to GetPeerNetworkMap improves readability and ensures consistent values are used if these methods had any side effects (even though they likely don't).

management/server/types/networkmapbuilder.go (4)

948-954: LGTM!

The updateAccountLocked helper correctly updates the builder's account only when the new account has a higher serial number, preventing stale data overwrites. The comment indicates the caller must hold the lock.


1688-1736: LGTM!

The mergeFrom method on PeerUpdateDelta correctly merges all delta fields while avoiding duplicates using slices.Contains checks. This supports the batch processing where multiple peer additions can affect the same target peer.


2042-2050: Improved deletion handling with explicit ACL rebuilds.

The refactored deletion flow now explicitly tracks peers that need ACL rebuilds (peersToRebuildACL) and rebuilds them after applying deletion updates. This is cleaner than the previous approach and ensures consistent state.


1175-1193: Lock ordering is consistent and safe—no deadlock or processing conflict exists.

The code maintains a consistent lock ordering throughout: apb.mu is always released before cache.mu is acquired (lines 1163), and when both locks are needed, cache.mu is held first followed by temporary apb.mu acquisitions. The pattern in incAddPeerLoop()addPeersIncrementally()enqueuePeersForIncrementalAdd() is safe: the queue is cloned and cleared before releasing apb.mu and acquiring cache.mu, so retry re-enqueueing adds peers to a fresh queue for the next iteration rather than causing duplicate processing in the current iteration. This retry behavior is intentional, not a defect.

Comment thread management/server/types/holder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses concurrency issues in the network map handling system by refactoring the NetworkMapBuilder to handle peer additions through a batched asynchronous queue mechanism. The key changes involve replacing atomic pointers with direct account references, introducing a background goroutine for batched peer processing, and updating method signatures to accept account parameters directly.

  • Introduces a batched peer addition queue with retry logic processed by a background goroutine
  • Replaces atomic pointer usage for account storage with direct pointer management and serial-based updates
  • Updates OnPeerAddedIncremental and OnPeerDeleted methods to accept account parameters for better concurrency control

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
management/server/types/networkmapbuilder.go Core changes introducing batched peer addition queue with background goroutine, refactoring account pointer management from atomic to direct references, and implementing delta merging for batch operations
management/server/types/networkmap.go Updates method signatures to pass account references directly and adds new batch enqueueing method
management/server/types/networkmap_golden_test.go Updates test calls to match new API signatures and adds new test for batched peer addition
management/server/types/holder.go Adds serial number comparison to prevent stale account overwrites
management/internals/controllers/network_map/controller/controller.go Updates to use new batch peer addition API and removes UpdateAccountPointer call
management/server/user.go Adds network serial incrementing when peers are expired/updated

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmap_golden_test.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/networkmapbuilder.go
Comment thread management/server/types/holder.go
Comment thread management/server/types/networkmapbuilder.go
@crn4 crn4 merged commit 7142d45 into main Jan 6, 2026
48 of 53 checks passed
@crn4 crn4 deleted the nmap/fix-concurrency branch January 6, 2026 18:25
@coderabbitai coderabbitai Bot mentioned this pull request Apr 3, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants