Skip to content

[management] fix ephemeral peers being not removed#5203

Merged
crn4 merged 2 commits intomainfrom
fix/ephemeral-peers
Jan 28, 2026
Merged

[management] fix ephemeral peers being not removed#5203
crn4 merged 2 commits intomainfrom
fix/ephemeral-peers

Conversation

@crn4
Copy link
Copy Markdown
Contributor

@crn4 crn4 commented Jan 28, 2026

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

bug fix

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes

    • Improved synchronization error handling to ensure per-peer cleanup happens when initial sync fails, preventing lingering routines and resource leaks.
  • Refactor

    • Adjusted timing of ephemeral-peer tracking during peer onboarding so ephemeral peers are managed after network state updates, improving reliability and consistency.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 28, 2026 16:29
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Sync error handling now cancels per-peer routines when initial sync send fails; ephemeral peer tracking in AddPeer is relocated to run after IncrementNetworkSerial and only for ephemeral peers. Import ordering was adjusted in one file.

Changes

Cohort / File(s) Summary
gRPC Sync error handling
management/internals/shared/grpc/server.go
Added call to cancelPeerRoutines(ctx, accountID, peer) when s.sendInitialSync fails; import ordering changed.
Peer lifecycle timing
management/server/peer.go
Moved TrackEphemeralPeer(ctx, newPeer) to after IncrementNetworkSerial and made it conditional on ephemeral, changing when ephemeral peers are registered for tracking.

Sequence Diagram(s)

(omitted — changes are local/small and do not introduce a new multi-component sequential flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mlsmaycon

Poem

🐰 A whisker-twitch, a tidy sweep,
Routines closed gently, secrets keep,
Ephemeral hops found their cue,
Serial bumped, then tracked anew,
Small fixes, clean and spry — hop through! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description follows the required template structure with checklist completed (bug fix confirmed, documentation noted as not needed), but lacks key details: the 'Describe your changes' and 'Issue ticket number and link' sections are empty. Add a detailed explanation of the changes in the 'Describe your changes' section and provide the related issue ticket number and link if available.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[management] fix ephemeral peers being not removed' directly and specifically describes the main bug fix being addressed in this pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

pascal-fischer
pascal-fischer previously approved these changes Jan 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
management/internals/shared/grpc/server.go (1)

304-310: Avoid re-locking the peer mutex on initial-sync failure.
cancelPeerRoutines acquires the same peer lock that Sync already holds, so this path will deadlock before the deferred unlock runs. Release the lock (or use a non-locking helper) before calling the cleanup. Line 304-309.

🔧 Proposed fix (release lock before cleanup)
	err = s.sendInitialSync(ctx, peerKey, peer, netMap, postureChecks, srv, dnsFwdPort)
	if err != nil {
		log.WithContext(ctx).Debugf("error while sending initial sync for %s: %v", peerKey.String(), err)
		s.syncSem.Add(-1)
-		s.cancelPeerRoutines(ctx, accountID, peer)
+		if unlock != nil {
+			unlock()
+			unlock = nil
+		}
+		s.cancelPeerRoutines(ctx, accountID, peer)
		return err
	}
🧹 Nitpick comments (1)
management/server/peer.go (1)

758-761: Move TrackEphemeralPeer outside the transaction to avoid side effects on rollback.
This call can run even if the transaction ultimately fails/rolls back, leaving the controller tracking a peer that never committed. Consider deferring it until after ExecuteInTransaction succeeds. Line 758-761.

♻️ Suggested change
-			if ephemeral {
-				// we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected
-				am.networkMapController.TrackEphemeralPeer(ctx, newPeer)
-			}

Then, after the transaction succeeds (e.g., right after the retry loop):

if ephemeral {
	am.networkMapController.TrackEphemeralPeer(ctx, newPeer)
}

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes cleanup behavior for ephemeral peers by ensuring they are tracked appropriately during peer registration and by cleaning up peer routines when sync initialization fails.

Changes:

  • Track ephemeral peers during AddPeer when ephemeral is true (instead of only when temporary is true).
  • Cancel peer-related routines when sendInitialSync fails in the gRPC Sync flow.
  • Minor import reordering in the gRPC server.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
management/server/peer.go Adjusts when/which peers are tracked as ephemeral during peer creation.
management/internals/shared/grpc/server.go Adds peer cleanup on initial sync send failure (and reorders an import).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread management/server/peer.go Outdated
}

if ephemeral {
// we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in this comment is incorrect: use "doesn't" instead of "don't", and consider rephrasing "and be marked" -> "and isn't marked" for clarity.

Suggested change
// we should track ephemeral peers to be able to clean them if the peer don't sync and be marked as connected
// we should track ephemeral peers to be able to clean them if the peer doesn't sync and isn't marked as connected

Copilot uses AI. Check for mistakes.
Comment thread management/server/peer.go
@sonarqubecloud
Copy link
Copy Markdown

@crn4 crn4 merged commit cead3f3 into main Jan 28, 2026
40 of 41 checks passed
@crn4 crn4 deleted the fix/ephemeral-peers branch January 28, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants