-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Networking] Implements a silence period for GossipSub peer scoring. #5084
[Networking] Implements a silence period for GossipSub peer scoring. #5084
Conversation
…coring-registry-startup-silence-period
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5084 +/- ##
==========================================
- Coverage 56.29% 56.29% -0.01%
==========================================
Files 994 994
Lines 94963 95003 +40
==========================================
+ Hits 53463 53485 +22
- Misses 37526 37540 +14
- Partials 3974 3978 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
config/default-config.yml
Outdated
# returned for all nodes will be 0, and any invalid control message notifications | ||
# will be ignored. This configuration allows nodes to stabilize and initialize before | ||
# applying penalties or processing invalid control message notifications. | ||
gossipsub-scoring-registry-startup-silence-duration: 20m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you come up with this default, and the min 10m
? This seems too long for some node types and potentially too short for others (ENs). rather than a timer, is it possible to trigger a signal when startup is complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good 🚀, please consider applying the comments prior merging.
network/p2p/scoring/registry.go
Outdated
func (r *GossipSubAppSpecificScoreRegistry) Start(parent irrecoverable.SignalerContext) { | ||
if !r.silencePeriodStartTime.IsZero() { | ||
parent.Throw(fmt.Errorf("gossipsub scoring registry started more than once")) | ||
} | ||
r.silencePeriodStartTime = time.Now() | ||
r.Component.Start(parent) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding this as a worker in NewGossipSubAppSpecificScoreRegistry
, overwriting the Start
seems unnecessary.
}).AddWorker(func(ctx irrecoverable.SignalerContext, ready component.ReadyFunc) {
if !reg.silencePeriodStartTime.IsZero() {
ctx.Throw(fmt.Errorf("gossipsub scoring registry started more than once"))
}
reg.silencePeriodStartTime = time.Now()
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network/p2p/scoring/registry.go
Outdated
// afterSilencePeriod returns true if registry silence period is over, false otherwise. | ||
func (r *GossipSubAppSpecificScoreRegistry) afterSilencePeriod() bool { | ||
return time.Since(r.silencePeriodStartTime) > r.silencePeriodDuration | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The silence period is a short startup interval. Once the silence period elapses, comparing the time on each invocation of the afterSilencePeriod
is unnecessary and may apply a computation overhead. Especially, this method is called frequently on each incoming message to the node (or each invalid control message notification). Maybe introducing an atomic boolean variable acts more efficiently:
In the below, we assume silencePeriodElpased
is an atomic boolean that is initialized to false
.
// afterSilencePeriod returns true if registry silence period is over, false otherwise.
func (r *GossipSubAppSpecificScoreRegistry) afterSilencePeriod() bool {
if !r.silencePeriodElpased.Load() {
if time.Since(r.silencePeriodStartTime) > r.silencePeriodDuration {
r.silencePeriodElpased.Store(true)
return true
}
return false
}
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network/p2p/scoring/registry_test.go
Outdated
@@ -387,8 +421,13 @@ func TestPersistingInvalidSubscriptionPenalty(t *testing.T) { | |||
} | |||
})) | |||
|
|||
ctx, cancel := context.WithCancel(context.Background()) | |||
signalerCtx := irrecoverable.NewMockSignalerContext(t, ctx) | |||
reg.Start(signalerCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these tests, we should first ensure that the registry has started and is in Ready
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network/p2p/scoring/registry_test.go
Outdated
silencedNotificationLogs := atomic.NewInt32(0) | ||
hook := zerolog.HookFunc(func(e *zerolog.Event, level zerolog.Level, message string) { | ||
if level == zerolog.DebugLevel { | ||
if message == "ignoring invalid control message notification for peer during silence period" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider casting this message as a constant that is shared between the code and test logic. In this way changing the log wording doesn't break the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Yahya Hassanzadeh, Ph.D. <[email protected]>
Co-authored-by: Yahya Hassanzadeh, Ph.D. <[email protected]>
Co-authored-by: Yahya Hassanzadeh, Ph.D. <[email protected]>
… github.com:onflow/flow-go into khalil/4979-scoring-registry-startup-silence-period
This PR addresses an issue observed in the mainnet24 spork, where nodes wrongly penalize peers for misbehavior during startup. The problem arises when nodes, not fully subscribed to all channels at startup, misinterpret messages from unsubscribed channels as malicious activity. This leads to unnecessary logs, inaccurate metrics, and the risk of early network fragmentation.
To solve this, the PR introduces a configurable silence duration (gossipsub-scoring-registry-startup-silence-duration). During this period:
ref: #4979