-
-
Notifications
You must be signed in to change notification settings - Fork 411
feat: handle chain stall in peer manager #7508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - we have tested this for a while on holesky-rescue branch and it was very useful to speed up our sync times + retrieving more details about our connected peers, see discord thread for more details
| return StatusScore.FAR_AHEAD; | ||
| } | ||
|
|
||
| // It seems dangerous to downscore peers that are far behind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we ever return FAR_BEHIND, then lodestar nodes will most likely disconnect peers that want to sync from us which is not great for the network. Should we remove FAR_BEHIND completely? or add more info for it
if we delete FAR_BEHIND enum then there is only 2 options: CLOSE_TO_US and FAR_AHEAD. At synced time, this function will always return CLOSE_TO_US. Hence the main usage for this function is for syncing time
please add all of these to the function description for later reference
twoeths
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the feature is great, I dropped some comments/suggestions
| buckets: [0.001, 0.01, 0.1, 1], | ||
| }), | ||
| starved: register.gauge({ | ||
| name: "lodestar_peer_manager_starved_bool", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have time would be good to add a panel somewhere that uses the bool
|
seems like we need to update tests here
|
| ...formatNodePeer(peerIdStr, connections), | ||
| agentVersion: peerData?.agentVersion ?? "NA", | ||
| status: peerData?.status ? ssz.phase0.Status.toJson(peerData.status) : null, | ||
| metadata: peerData?.metadata ? ssz.altair.Metadata.toJson(peerData.metadata) : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once Fusaka goes live we should update this to use fulu.Metadata, or just if-else based on clock slot but we need the type first #7774
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #7508 +/- ##
============================================
- Coverage 56.04% 56.03% -0.02%
============================================
Files 823 823
Lines 57892 57951 +59
Branches 4453 4461 +8
============================================
+ Hits 32448 32472 +24
- Misses 25376 25411 +35
Partials 68 68 🚀 New features to boost your workflow:
|
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
🎉 This PR is included in v1.30.0 🎉 |
Motivation
Description
starvedcheck in the peer manager heartbeatstarvedwhen its head hasn't updates since the last heartbeat and is behindSTARVATION_THRESHOLD_SLOTSstarvedand has at leasttargetPeers, attempt to prune an additionaltargetPeers * STARVATION_PRUNE_RATIOpeersstarved, disallow pruning of peers that areFAR_AHEADof usstarvedor not, adjust the sorting for pruning to disfavor pruning peers that areFAR_AHEADof us