[management] fix ephemeral peers being not removed#5203
Conversation
…onnect status upd on sync failure
📝 WalkthroughWalkthroughSync error handling now cancels per-peer routines when initial sync send fails; ephemeral peer tracking in AddPeer is relocated to run after IncrementNetworkSerial and only for ephemeral peers. Import ordering was adjusted in one file. Changes
Sequence Diagram(s)(omitted — changes are local/small and do not introduce a new multi-component sequential flow) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
management/internals/shared/grpc/server.go (1)
304-310: Avoid re-locking the peer mutex on initial-sync failure.
cancelPeerRoutinesacquires the same peer lock thatSyncalready holds, so this path will deadlock before the deferred unlock runs. Release the lock (or use a non-locking helper) before calling the cleanup. Line 304-309.🔧 Proposed fix (release lock before cleanup)
err = s.sendInitialSync(ctx, peerKey, peer, netMap, postureChecks, srv, dnsFwdPort) if err != nil { log.WithContext(ctx).Debugf("error while sending initial sync for %s: %v", peerKey.String(), err) s.syncSem.Add(-1) - s.cancelPeerRoutines(ctx, accountID, peer) + if unlock != nil { + unlock() + unlock = nil + } + s.cancelPeerRoutines(ctx, accountID, peer) return err }
🧹 Nitpick comments (1)
management/server/peer.go (1)
758-761: MoveTrackEphemeralPeeroutside the transaction to avoid side effects on rollback.
This call can run even if the transaction ultimately fails/rolls back, leaving the controller tracking a peer that never committed. Consider deferring it until afterExecuteInTransactionsucceeds. Line 758-761.♻️ Suggested change
- if ephemeral { - // we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected - am.networkMapController.TrackEphemeralPeer(ctx, newPeer) - }Then, after the transaction succeeds (e.g., right after the retry loop):
if ephemeral { am.networkMapController.TrackEphemeralPeer(ctx, newPeer) }
There was a problem hiding this comment.
Pull request overview
Fixes cleanup behavior for ephemeral peers by ensuring they are tracked appropriately during peer registration and by cleaning up peer routines when sync initialization fails.
Changes:
- Track ephemeral peers during
AddPeerwhenephemeralis true (instead of only whentemporaryis true). - Cancel peer-related routines when
sendInitialSyncfails in the gRPCSyncflow. - Minor import reordering in the gRPC server.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| management/server/peer.go | Adjusts when/which peers are tracked as ephemeral during peer creation. |
| management/internals/shared/grpc/server.go | Adds peer cleanup on initial sync send failure (and reorders an import). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| if ephemeral { | ||
| // we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected |
There was a problem hiding this comment.
Grammar in this comment is incorrect: use "doesn't" instead of "don't", and consider rephrasing "and be marked" -> "and isn't marked" for clarity.
| // we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected | |
| // we should track ephemeral peers to be able to clean them if the peer doesn't sync and isn't marked as connected |
|



Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
bug fix
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.