Skip to content
This repository was archived by the owner on Jan 16, 2026. It is now read-only.

fix(node/p2p): fixing gossip stability by limiting the number of dials to a given peer#2025

Merged
theochap merged 3 commits intomainfrom
theo/fix-gossip-stability
Jun 6, 2025
Merged

fix(node/p2p): fixing gossip stability by limiting the number of dials to a given peer#2025
theochap merged 3 commits intomainfrom
theo/fix-gossip-stability

Conversation

@theochap
Copy link
Copy Markdown
Member

@theochap theochap commented Jun 5, 2025

Description

This PR fixes the gossip stability issues we've experience on initial connections to the op-nodes by preventing kona-nodes from redialing already connected peers, or peers that have been already dialed.

With that fix on, we're not experiencing the peer drops in kurtosis anymore

On a side note: this PR also increases the log levels of some gossip events.

Explanation

My understanding of the situation:

  • The libp2p library imposes a substream limit for outgoing connections (constant, equal to 5). This is to prevent DOS vectors when creating new connections
  • It seems that every dial call will open a new outbound substream.
  • Since we're dialing peers quite often in the discovery layer, we're causing the libp2p library to reach the substream limit
  • When we reach the substream limit, the protocol gets disconnected and we're not advertising it to peers anymore. This causes the gossip connection to drop

Relevant piece of code in libp2p : https://github.com/libp2p/rust-libp2p/blob/d3e88cfc2ec944c3e6beb7117a762452cb855e38/protocols/gossipsub/src/handler.rs#L499-L511. In our case, the outbound event is consistently the ConnectionEvent::FullyNegotiatedOutbound event, which happens on every new dial.

Development

@theochap theochap self-assigned this Jun 5, 2025
@theochap theochap added K-fix Kind: fix A-node Area: cl node (eq. Go op-node) handles single-chain consensus A-p2p Area: p2p labels Jun 5, 2025
@theochap theochap moved this to In Review in Project Tracking Jun 5, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 5, 2025

Codecov Report

Attention: Patch coverage is 0% with 56 lines in your changes missing coverage. Please review.

Project coverage is 83.1%. Comparing base (19e3ab5) to head (64c4969).
Report is 2 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/node/p2p/src/gossip/driver.rs 0.0% 33 Missing ⚠️
crates/node/p2p/src/rpc/request.rs 0.0% 23 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@refcell refcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also closes #1854 I believe

@theochap
Copy link
Copy Markdown
Member Author

theochap commented Jun 5, 2025

Also closes #1854 I believe

Yes! thanks for the reminder

@theochap theochap force-pushed the theo/fix-gossip-stability branch from 62b0812 to 5688cd9 Compare June 5, 2025 23:19
@theochap theochap force-pushed the theo/add-peer-metadata branch from 2588122 to 69b40aa Compare June 5, 2025 23:19
@theochap theochap force-pushed the theo/fix-gossip-stability branch from 5688cd9 to 1850944 Compare June 5, 2025 23:19
@theochap theochap linked an issue Jun 5, 2025 that may be closed by this pull request
@theochap theochap force-pushed the theo/fix-gossip-stability branch from 1850944 to bd74a83 Compare June 5, 2025 23:38
Base automatically changed from theo/add-peer-metadata to main June 5, 2025 23:41
@theochap theochap force-pushed the theo/fix-gossip-stability branch from bd74a83 to 2b2a7bd Compare June 5, 2025 23:44
@theochap theochap force-pushed the theo/fix-gossip-stability branch from 2b2a7bd to 64c4969 Compare June 6, 2025 00:43
@theochap theochap enabled auto-merge June 6, 2025 00:45
@theochap theochap added this pull request to the merge queue Jun 6, 2025
Merged via the queue into main with commit f53d34b Jun 6, 2025
22 of 23 checks passed
@theochap theochap deleted the theo/fix-gossip-stability branch June 6, 2025 01:06
@github-project-automation github-project-automation bot moved this from In Review to Done in Project Tracking Jun 6, 2025
theochap added a commit to ethereum-optimism/optimism that referenced this pull request Dec 10, 2025
…s to a given peer (op-rs/kona#2025)

## Description

This PR fixes the gossip stability issues we've experience on initial
connections to the `op-node`s by preventing `kona-node`s from redialing
already connected peers, or peers that have been already dialed.

With that fix on, we're not experiencing the peer drops in kurtosis
anymore

On a side note: this PR also increases the log levels of some gossip
events.

## Explanation

My understanding of the situation:

- The libp2p library imposes a substream limit for outgoing connections
(constant, equal to 5). This is to prevent DOS vectors when creating new
connections
- It seems that every dial call will open a new outbound substream.
- Since we're dialing peers quite often in the discovery layer, we're
causing the libp2p library to reach the substream limit
- When we reach the substream limit, the protocol gets disconnected and
we're not advertising it to peers anymore. This causes the gossip
connection to drop

Relevant piece of code in libp2p :
https://github.com/libp2p/rust-libp2p/blob/d3e88cfc2ec944c3e6beb7117a762452cb855e38/protocols/gossipsub/src/handler.rs#L499-L511.
In our case, the outbound event is consistently the
ConnectionEvent::FullyNegotiatedOutbound event, which happens on every
new dial.

## Development

- Close op-rs/kona#1854
theochap added a commit to ethereum-optimism/optimism that referenced this pull request Jan 14, 2026
…s to a given peer (op-rs/kona#2025)

## Description

This PR fixes the gossip stability issues we've experience on initial
connections to the `op-node`s by preventing `kona-node`s from redialing
already connected peers, or peers that have been already dialed.

With that fix on, we're not experiencing the peer drops in kurtosis
anymore

On a side note: this PR also increases the log levels of some gossip
events.

## Explanation

My understanding of the situation:

- The libp2p library imposes a substream limit for outgoing connections
(constant, equal to 5). This is to prevent DOS vectors when creating new
connections
- It seems that every dial call will open a new outbound substream.
- Since we're dialing peers quite often in the discovery layer, we're
causing the libp2p library to reach the substream limit
- When we reach the substream limit, the protocol gets disconnected and
we're not advertising it to peers anymore. This causes the gossip
connection to drop

Relevant piece of code in libp2p :
https://github.com/libp2p/rust-libp2p/blob/d3e88cfc2ec944c3e6beb7117a762452cb855e38/protocols/gossipsub/src/handler.rs#L499-L511.
In our case, the outbound event is consistently the
ConnectionEvent::FullyNegotiatedOutbound event, which happens on every
new dial.

## Development

- Close #1854
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A-node Area: cl node (eq. Go op-node) handles single-chain consensus A-p2p Area: p2p K-fix Kind: fix

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

fix(node/p2p): remove infinite redial by default

2 participants