Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Networking] Implements a silence period for GossipSub peer scoring. #5084

Merged
merged 22 commits into from
Jan 18, 2024

Conversation

kc1116
Copy link
Contributor

@kc1116 kc1116 commented Nov 29, 2023

This PR addresses an issue observed in the mainnet24 spork, where nodes wrongly penalize peers for misbehavior during startup. The problem arises when nodes, not fully subscribed to all channels at startup, misinterpret messages from unsubscribed channels as malicious activity. This leads to unnecessary logs, inaccurate metrics, and the risk of early network fragmentation.

To solve this, the PR introduces a configurable silence duration (gossipsub-scoring-registry-startup-silence-duration). During this period:

  • Gossipsub queries for a peer's application-specific score return a default score of 0, with no penalties.
  • Invalid control message notifications are ignored, preventing the creation of spam records and penalties. This ensures nodes aren't penalized immediately after the silence period.

ref: #4979

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2023

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (f040031) 56.29% compared to head (bb1d6a3) 56.29%.

Files Patch % Lines
network/p2p/test/fixtures.go 0.00% 12 Missing ⚠️
network/netconf/flags.go 25.00% 3 Missing ⚠️
network/p2p/scoring/registry.go 88.88% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5084      +/-   ##
==========================================
- Coverage   56.29%   56.29%   -0.01%     
==========================================
  Files         994      994              
  Lines       94963    95003      +40     
==========================================
+ Hits        53463    53485      +22     
- Misses      37526    37540      +14     
- Partials     3974     3978       +4     
Flag Coverage Δ
unittests 56.29% <58.13%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yhassanzadeh13 yhassanzadeh13 changed the title Khalil/4979 LibP2P node startup gossipsub silence period. [Networking] Implements a silence period for GossipSub peer scoring. Dec 11, 2023
# returned for all nodes will be 0, and any invalid control message notifications
# will be ignored. This configuration allows nodes to stabilize and initialize before
# applying penalties or processing invalid control message notifications.
gossipsub-scoring-registry-startup-silence-duration: 20m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come up with this default, and the min 10m? This seems too long for some node types and potentially too short for others (ENs). rather than a timer, is it possible to trigger a signal when startup is complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kc1116 maybe we can use this node.Ready channel instead of a silence period.

case <-node.Ready():

Copy link
Contributor Author

@kc1116 kc1116 Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b8a2fd2 The silence period default duration has been changed to 1 hour. Here is the follow up issue to create a component that can notify other components when it is ready

Copy link
Contributor

@yhassanzadeh13 yhassanzadeh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good 🚀, please consider applying the comments prior merging.

Comment on lines 190 to 196
func (r *GossipSubAppSpecificScoreRegistry) Start(parent irrecoverable.SignalerContext) {
if !r.silencePeriodStartTime.IsZero() {
parent.Throw(fmt.Errorf("gossipsub scoring registry started more than once"))
}
r.silencePeriodStartTime = time.Now()
r.Component.Start(parent)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding this as a worker in NewGossipSubAppSpecificScoreRegistry, overwriting the Start seems unnecessary.

}).AddWorker(func(ctx irrecoverable.SignalerContext, ready component.ReadyFunc) {
		if !reg.silencePeriodStartTime.IsZero() {
			ctx.Throw(fmt.Errorf("gossipsub scoring registry started more than once"))
		}
		reg.silencePeriodStartTime = time.Now()	
	})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network/p2p/scoring/registry.go Outdated Show resolved Hide resolved
network/p2p/scoring/registry.go Outdated Show resolved Hide resolved
Comment on lines 369 to 373
// afterSilencePeriod returns true if registry silence period is over, false otherwise.
func (r *GossipSubAppSpecificScoreRegistry) afterSilencePeriod() bool {
return time.Since(r.silencePeriodStartTime) > r.silencePeriodDuration
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The silence period is a short startup interval. Once the silence period elapses, comparing the time on each invocation of the afterSilencePeriod is unnecessary and may apply a computation overhead. Especially, this method is called frequently on each incoming message to the node (or each invalid control message notification). Maybe introducing an atomic boolean variable acts more efficiently:

In the below, we assume silencePeriodElpased is an atomic boolean that is initialized to false.

// afterSilencePeriod returns true if registry silence period is over, false otherwise.
func (r *GossipSubAppSpecificScoreRegistry) afterSilencePeriod() bool {
	if !r.silencePeriodElpased.Load() {
		if time.Since(r.silencePeriodStartTime) > r.silencePeriodDuration {
			r.silencePeriodElpased.Store(true)
			return true
		}
		return false
	}
	return true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network/p2p/scoring/registry_test.go Outdated Show resolved Hide resolved
@@ -387,8 +421,13 @@ func TestPersistingInvalidSubscriptionPenalty(t *testing.T) {
}
}))

ctx, cancel := context.WithCancel(context.Background())
signalerCtx := irrecoverable.NewMockSignalerContext(t, ctx)
reg.Start(signalerCtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all these tests, we should first ensure that the registry has started and is in Ready mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silencedNotificationLogs := atomic.NewInt32(0)
hook := zerolog.HookFunc(func(e *zerolog.Event, level zerolog.Level, message string) {
if level == zerolog.DebugLevel {
if message == "ignoring invalid control message notification for peer during silence period" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider casting this message as a constant that is shared between the code and test logic. In this way changing the log wording doesn't break the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kc1116 kc1116 requested a review from peterargue January 9, 2024 16:52
@kc1116 kc1116 added this pull request to the merge queue Jan 18, 2024
Merged via the queue into master with commit aeb29e8 Jan 18, 2024
50 of 51 checks passed
@kc1116 kc1116 deleted the khalil/4979-scoring-registry-startup-silence-period branch January 18, 2024 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants