Skip to content

[management] fix skip of ephemeral peers on deletion#5206

Merged
crn4 merged 4 commits intomainfrom
fix/ephemeral-peers
Jan 29, 2026
Merged

[management] fix skip of ephemeral peers on deletion#5206
crn4 merged 4 commits intomainfrom
fix/ephemeral-peers

Conversation

@crn4
Copy link
Copy Markdown
Contributor

@crn4 crn4 commented Jan 28, 2026

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

bug fix

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes

    • Peer deletion now gracefully skips missing or ineligible peers and continues processing remaining peers instead of stopping on the first failure.
  • Chores

    • Enhanced logging: added finer-grained trace logs for skipped deletions and adjusted error logging to report multi-peer failures while keeping overall behavior unchanged.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 28, 2026 22:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Logging verbosity and wording were adjusted and peer-deletion error handling changed to skip problematic peers: NotFound and transactional errors are now logged at trace/error levels and the deletion process continues for remaining peers rather than returning early. (48 words)

Changes

Cohort / File(s) Summary
Ephemeral manager logging
management/internals/modules/peers/ephemeral/manager/ephemeral.go
Per-account deletion log changed from Debugf to Tracef and error message wording updated from singular "peer" to plural "peers".
Peer deletion flow & error handling
management/internals/modules/peers/manager.go
DeletePeers now treats NotFound as a skipped peer (logs trace), logs trace when peers are ineligible (connected/recently seen), and on transactional delete errors logs and continues to next peer instead of returning.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • mlsmaycon
  • pascal-fischer

Poem

🐰 I nudge the logs and tiptoe past a peer,
If it's gone or busy, I whisper, "No fear."
I skip, I trace, I carry on the night—
A quiet rabbit keeping deletions light. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete with placeholder sections and minimal details despite being marked as a bug fix. Complete the 'Describe your changes' section with specifics about what was fixed, add the issue ticket link, and explain why the fix improves ephemeral peer deletion handling.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: fixing skip behavior of ephemeral peers during deletion, which aligns with the code changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where ephemeral peer deletion would fail for the entire batch if any single peer encountered an error. The fix allows the deletion process to continue processing remaining peers even when individual deletions fail.

Changes:

  • Modified DeletePeers to gracefully handle NotFound errors by skipping already-deleted peers
  • Changed error handling from aborting the loop to logging errors and continuing with remaining peers
  • Enhanced trace logging to provide better visibility into peer deletion and skip decisions

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
management/internals/modules/peers/manager.go Added graceful handling for NotFound errors, enhanced trace logging for skipped peers, and changed from returning errors to logging them and continuing the loop
management/internals/modules/peers/ephemeral/manager/ephemeral.go Updated logging levels and improved error message wording for batch peer deletion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread management/internals/modules/peers/manager.go Outdated
Comment thread management/internals/modules/peers/manager.go
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@management/internals/modules/peers/manager.go`:
- Around line 161-164: The DeletePeers function currently declares an error
return but always logs errors and continues (best-effort), which is misleading;
change the function signature (DeletePeers) to return no error (remove the error
return) and treat it explicitly as a best-effort/ephemeral cleanup routine,
update its single caller (the ephemeral peer expiration cleanup) to stop
expecting an error, and keep the existing per-peer log-on-error-and-continue
behavior (log.WithContext(ctx).Errorf(...) and continue) so the contract matches
the implementation; alternatively, if you prefer to preserve an error return,
replace it with an aggregated error (e.g., collect errors into a slice or use a
multierror) and return that summary instead—pick one approach and apply it
consistently to DeletePeers and its caller.
🧹 Nitpick comments (1)
management/internals/modules/peers/manager.go (1)

118-124: Minor: time.Now() called twice may log a different threshold than evaluated.

The threshold is computed with time.Now() both in the condition (line 118) and in the log message (lines 122-123). The slight time difference between these calls means the logged threshold won't exactly match what was evaluated, which could cause confusion during debugging.

🔧 Suggested fix: compute threshold once
+			threshold := time.Now().Add(-(ephemeral.EphemeralLifeTime - 10 * time.Second))
-			if checkConnected && (peer.Status.Connected || peer.Status.LastSeen.After(time.Now().Add(-(ephemeral.EphemeralLifeTime - 10*time.Second)))) {
+			if checkConnected && (peer.Status.Connected || peer.Status.LastSeen.After(threshold)) {
 				log.WithContext(ctx).Tracef("DeletePeers: peer %s skipped (connected=%t, lastSeen=%s, threshold=%s, ephemeral=%t)",
 					peerID, peer.Status.Connected,
 					peer.Status.LastSeen.Format(time.RFC3339),
-					time.Now().Add(-(ephemeral.EphemeralLifeTime - 10*time.Second)).Format(time.RFC3339),
+					threshold.Format(time.RFC3339),
 					peer.Ephemeral)
 				return nil
 			}

Comment on lines 161 to 164
if err != nil {
return err
log.WithContext(ctx).Errorf("DeletePeers: failed to delete peer %s: %v", peerID, err)
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and find the DeletePeers function
cd management/internals/modules/peers
wc -l manager.go

Repository: netbirdio/netbird

Length of output: 75


🏁 Script executed:

# Read the DeletePeers function - lines around 161-179 to see full context
sed -n '140,185p' management/internals/modules/peers/manager.go

Repository: netbirdio/netbird

Length of output: 1304


🏁 Script executed:

# Search for all callers of DeletePeers function
rg -n 'DeletePeers\s*\(' --type=go -g '!*_mock.go' -g '!*_test.go'

Repository: netbirdio/netbird

Length of output: 568


🏁 Script executed:

# Read the caller context in ephemeral manager
sed -n '180,210p' management/internals/modules/peers/ephemeral/manager/ephemeral.go

Repository: netbirdio/netbird

Length of output: 891


🏁 Script executed:

# Search for tests of DeletePeers to understand intended behavior
rg -n 'DeletePeers' --type=go -A 10 -B 2 | grep -E '(test|Test|_test\.go)' -A 10 -B 2

Repository: netbirdio/netbird

Length of output: 6933


🏁 Script executed:

# Look for all test files that test DeletePeers
find . -name '*_test.go' -type f -exec grep -l 'DeletePeers' {} \;

Repository: netbirdio/netbird

Length of output: 133


🏁 Script executed:

# Read the DeletePeers function signature and any comments
sed -n '99,110p' management/internals/modules/peers/manager.go

Repository: netbirdio/netbird

Length of output: 630


🏁 Script executed:

# Also check if there's an interface definition that documents expected behavior
rg -n 'interface Manager' -A 20 management/internals/modules/peers/manager.go

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Check the interface definition for DeletePeers
rg -n 'DeletePeers' management/internals/modules/peers/manager.go | head -5

Repository: netbirdio/netbird

Length of output: 619


🏁 Script executed:

# Look for the interface definition in the same file
sed -n '1,50p' management/internals/modules/peers/manager.go

Repository: netbirdio/netbird

Length of output: 2218


The error-on-continue behavior is intentional for ephemeral cleanup, but clarify the misleading error contract.

The function's design—logging errors and continuing to the next peer without failing—is appropriate for ephemeral peer cleanup (temporary peers will expire anyway). However, the function signature declares an error return type but always returns nil, which is misleading. The single caller (ephemeral peer expiration cleanup) correctly handles this, but the design could confuse future developers or cause bugs if the function is extended to other contexts.

Consider documenting this as best-effort cleanup, or distinguish between ephemeral cleanup (which can be best-effort) and other deletion scenarios that may require stricter error handling. Alternatively, return an error summary or retry count to make the contract explicit.

🤖 Prompt for AI Agents
In `@management/internals/modules/peers/manager.go` around lines 161 - 164, The
DeletePeers function currently declares an error return but always logs errors
and continues (best-effort), which is misleading; change the function signature
(DeletePeers) to return no error (remove the error return) and treat it
explicitly as a best-effort/ephemeral cleanup routine, update its single caller
(the ephemeral peer expiration cleanup) to stop expecting an error, and keep the
existing per-peer log-on-error-and-continue behavior
(log.WithContext(ctx).Errorf(...) and continue) so the contract matches the
implementation; alternatively, if you prefer to preserve an error return,
replace it with an aggregated error (e.g., collect errors into a slice or use a
multierror) and return that summary instead—pick one approach and apply it
consistently to DeletePeers and its caller.

@crn4 crn4 merged commit 0169e45 into main Jan 29, 2026
41 checks passed
@crn4 crn4 deleted the fix/ephemeral-peers branch January 29, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants