feat(node/p2p): extend network actor test with block propagation. reduce test flakiness by theochap · Pull Request #2693 · op-rs/kona

theochap · 2025-08-12T19:25:02Z

Description

This PR achieves the following goals:

Extend the simple p2p connection tests of feat(net/test): add integration test skeleton #2679 with more comprehensive tests involving block signature and propagation (a follow PR will add the cases for higher block versions, it requires updating the rollup config with hardfork times)
Fixes the network bootstrap logic to remove flakiness. In particular, it uses kernel allocated ports for discovery/gossip to avoid static modification races.
It adds random valid block building logic

codecov · 2025-08-12T19:27:18Z

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.7%. Comparing base (6751bfd) to head (2e142b2).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
crates/node/gossip/src/driver.rs	75.0%	1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

bin/node/src/commands/bootstore.rs

crates/node/peers/src/store.rs

crates/node/gossip/src/driver.rs

refcell · 2025-08-12T20:11:24Z

crates/node/service/src/actors/network/driver.rs

+        if self.enr_update {
+            // Update the local ENR socket to the gossip listen address.
+            // Parse the multiaddr to a socket address.
+            let ip_address = gossip_listen_addr
+                .iter()
+                .find_map(|p| match p {
+                    Protocol::Ip4(ip) => Some(IpAddr::V4(ip)),
+                    Protocol::Ip6(ip) => Some(IpAddr::V6(ip)),
+                    _ => None,
+                })
+                .ok_or_else(|| {
+                    NetworkDriverError::InvalidGossipListenAddr(gossip_listen_addr.clone())
+                })?;
+            let port = gossip_listen_addr
+                .iter()
+                .find_map(|p| match p {
+                    Protocol::Tcp(port) => Some(port),
+                    _ => None,
+                })
+                .ok_or_else(|| {
+                    NetworkDriverError::InvalidGossipListenAddr(gossip_listen_addr.clone())
+                })?;
+
+            self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);
+        }


This is a good candidate for a refactor in a separate method.

Would that really increase readability? We're not really doing that much here and we don't really have this pattern elsewhere. I can refactor this out if we ever decide to add this pattern in other parts of the codebase

refcell

No blockers from my side

claude · 2025-08-12T21:33:28Z

Claude finished @theochap's task —— View job

PR Review Complete ✅

I've completed my review of this PR which extends network actor tests with block propagation and reduces test flakiness. Here's my comprehensive feedback:

Todo List:

Read and understand the main networking changes
Review new test infrastructure and block generation logic
Analyze code quality and best practices adherence
Check for potential bugs or issues
Evaluate performance considerations
Assess security concerns
Review test coverage improvements
Provide comprehensive feedback

Review Feedback

✅ Strengths

Excellent Solution to Port Conflicts: The use of kernel-allocated ports (port 0) in bin/node/src/flags/p2p.rs:73-76 and crates/node/service/tests/actors/network/mocks/builder.rs:80 effectively eliminates test flakiness from port conflicts. This is a best practice for testing.
Comprehensive Test Coverage: The new test suite covers multiple important scenarios:
- Simple P2P connections (test_p2p_network_conn)
- Large network topology testing (test_large_network_conn)
- Block propagation from sequencer to validators (test_sequencer_network_conn)
- Full network propagation testing (test_sequencer_network_propagation)
Well-Structured Test Infrastructure: The refactoring from a single mocks.rs file to a proper module structure with mocks/mod.rs and mocks/builder.rs improves maintainability.
Deterministic Block Generation: The SeedGenerator approach provides reproducible test blocks while maintaining randomness between test runs.

⚠️ Issues and Concerns

Code Quality Issues

Excessive Memory Allocation (crates/node/service/tests/actors/generator/block_builder.rs:49):
```
let data = self.random_bytes(1024 * 1024);  // 1MB allocation per block!
```
This allocates 1MB of random data per test block, which is unnecessary and wasteful for testing.
Potential ENR Update Race Condition (crates/node/service/src/actors/network/driver.rs:75):
```
self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);
```
The ENR update happens immediately after getting the gossip address. Consider if this needs proper sequencing or error handling.
Unwrap Usage in Test Code (crates/node/service/tests/actors/network/mocks/builder.rs:44,47):
```
.map_err(|e| anyhow::anyhow!("Impossible...")).unwrap()
```
While acceptable in tests, the error messages suggest these cases are truly impossible, so consider using expect() with clearer messages.

Performance Considerations

Large Test Networks: The test_large_network_conn creates 10 networks and tests full mesh connectivity (O(n²) connections). Consider if this is necessary or if a smaller subset would suffice.
Timeout Configuration: The retry logic in is_connected_to_with_retries uses a 6-minute total delay, which may be excessive for CI environments.

Security Considerations

Test-Only Restrictions: The discovery table filter in builder.rs:70-76 correctly restricts to loopback addresses for testing, preventing external connections.
Proper Key Handling: The keypair generation and address derivation follows secure practices using secp256k1.

🔧 Recommendations

Reduce Block Generation Memory Usage:

// Instead of 1MB, use much smaller test data
let data = self.random_bytes(1024); // 1KB should be sufficient

Add Error Handling for ENR Updates:

if let Err(e) = self.discovery.disc.update_local_enr_socket(...) {
    warn!("Failed to update ENR socket: {}", e);
}

Consider Parameterizing Test Network Size:

const NETWORKS: usize = if cfg!(debug_assertions) { 5 } else { 10 };

Add Timeout Configuration for CI:

let timeout = Duration::from_secs(if env::var("CI").is_ok() { 60 } else { 360 });

✅ Best Practices Followed

✅ Proper separation of concerns with dedicated test modules
✅ Use of kernel-allocated ports for test isolation
✅ Comprehensive error handling with custom error types
✅ Good use of async testing patterns
✅ Proper cleanup with cancellation tokens
✅ Following Rust naming conventions and documentation standards

📋 Test Coverage Assessment

The PR significantly improves test coverage by adding:

P2P connection establishment testing
Block propagation verification
Network topology testing
Sequencer-validator interaction testing

The test infrastructure is well-designed and should catch regressions effectively.

🎯 Overall Assessment

This is a solid PR that addresses real testing flakiness issues while adding valuable test coverage. The kernel-allocated ports solution is excellent, and the new test infrastructure is well-architected. The main concerns are around resource usage optimization and some edge case error handling, but these don't block the functionality.

Recommendation: Approve with minor optimizations suggested above.

Copilot

Pull Request Overview

This PR extends the network actor testing infrastructure with comprehensive block propagation tests and reduces test flakiness by using kernel-allocated ports for discovery/gossip protocols.

Adds comprehensive sequencer network tests for block signature and propagation
Replaces static port allocation with kernel-allocated ports to avoid race conditions
Introduces random valid block building logic for payload generation

Reviewed Changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
examples/gossip/src/main.rs	Enables ENR update for gossip command configuration
crates/node/service/tests/actors/utils.rs	Removes legacy seed generator utilities (file deleted)
crates/node/service/tests/actors/network/sequencer.rs	Adds new sequencer network connectivity and propagation tests
crates/node/service/tests/actors/network/p2p.rs	Adds basic P2P network connectivity tests
crates/node/service/tests/actors/network/mod.rs	Restructures network test module organization
crates/node/service/tests/actors/network/mocks/mod.rs	Creates new test network mock infrastructure
crates/node/service/tests/actors/network/mocks/builder.rs	Implements test network builder with configurable sequencer support
crates/node/service/tests/actors/network/mocks.rs	Removes legacy test network mock implementation (file deleted)
crates/node/service/tests/actors/mod.rs	Updates module structure to include generator utilities
crates/node/service/tests/actors/generator/seed.rs	Adds improved seed generator with thread-safe atomic operations
crates/node/service/tests/actors/generator/mod.rs	Creates generator module structure
crates/node/service/tests/actors/generator/block_builder.rs	Implements random valid block generation for different payload versions
crates/node/service/src/actors/network/driver.rs	Updates network driver to support ENR socket updates after gossip startup
crates/node/service/src/actors/network/config.rs	Adds ENR update configuration option
crates/node/service/src/actors/network/builder.rs	Enhances network builder with ENR update support
crates/node/service/Cargo.toml	Adds consensus-related dependencies for test block generation
crates/node/gossip/src/driver.rs	Updates gossip driver to return actual listen address after startup

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-12T21:34:45Z

crates/node/service/tests/actors/generator/seed.rs

+    }
+
+    fn next(&self) -> u64 {
+        self.0.fetch_add(1, Ordering::SeqCst)


Using Ordering::SeqCst is unnecessarily strong for a simple counter. Consider using Ordering::Relaxed since the exact ordering of seed generation doesn't require sequential consistency.

Suggested change

self.0.fetch_add(1, Ordering::SeqCst)

self.0.fetch_add(1, Ordering::Relaxed)

Copilot · 2025-08-12T21:34:46Z

crates/node/service/tests/actors/generator/block_builder.rs

+}
+
+impl SeedGenerator {
+    /// Generate a random op execution payload.


The function lacks documentation explaining the purpose and behavior of generating different payload versions. Consider adding a docstring that describes what makes each version valid and how they differ.

Suggested change

/// Generate a random op execution payload.

/// Generate a random op execution payload for the specified payload version.

///

/// This function creates a random, valid `OpExecutionPayloadEnvelope` by constructing a block

/// compatible with the given `PayloadVersion`. Each version corresponds to a different set of

/// protocol features and header/body fields:

///

/// - `V1`: The minimal version, omits withdrawals, blob gas, parent beacon block root, and requests hash.

/// - `V2`: Adds withdrawals and the corresponding withdrawals root to the block.

/// - `V3`: Adds blob gas fields and parent beacon block root, in addition to withdrawals.

/// - `V4`: Currently identical to V3, but reserved for future protocol extensions.

///

/// This allows testing and validation of payload handling across different protocol upgrades.

Copilot · 2025-08-12T21:34:46Z

crates/node/service/tests/actors/network/sequencer.rs

+/// We are setting up a linear network topology, and we check that the block propagates to every
+/// block of the network.
+#[tokio::test(flavor = "multi_thread")]
+async fn test_sequencer_network_propagation() -> anyhow::Result<()> {


[nitpick] The magic number 10 should be documented or made configurable. Consider adding a comment explaining why this specific number of networks is used for the propagation test.

Suggested change

async fn test_sequencer_network_propagation() -> anyhow::Result<()> {

async fn test_sequencer_network_propagation() -> anyhow::Result<()> {

// Use 10 networks to ensure propagation is tested across a reasonably sized network.

// This value is chosen to balance test coverage and execution time.

Copilot · 2025-08-12T21:34:46Z

crates/node/service/tests/actors/network/p2p.rs

+}
+
+#[tokio::test(flavor = "multi_thread")]
+async fn test_large_network_conn() -> anyhow::Result<()> {


[nitpick] The magic number 10 should be documented or made configurable. Consider adding a comment explaining why this specific number of networks is used for the large network test.

Suggested change

async fn test_large_network_conn() -> anyhow::Result<()> {

async fn test_large_network_conn() -> anyhow::Result<()> {

// Use 10 networks to simulate a moderately large P2P network and ensure scalability.

// This value balances test coverage and execution time.

Copilot · 2025-08-12T21:34:47Z

crates/node/service/tests/actors/network/mocks/mod.rs

+        other: &Self,
+    ) -> Result<(), TestNetworkError> {
+        (async || self.is_connected_to(other).await)
+            .retry(ExponentialBuilder::default().with_total_delay(Some(Duration::from_secs(360))))


[nitpick] The 360-second timeout is quite long and may slow down tests significantly if connections fail. Consider making this configurable or documenting why such a long timeout is necessary.

Suggested change

.retry(ExponentialBuilder::default().with_total_delay(Some(Duration::from_secs(360))))

///

/// The total retry timeout is configurable (default: 360 seconds). A long timeout may be necessary

/// in CI or slow environments, but can be reduced for faster feedback during development.

pub(super) async fn is_connected_to_with_retries(

&self,

other: &Self,

timeout_secs: Option<u64>,

) -> Result<(), TestNetworkError> {

let total_delay = Duration::from_secs(timeout_secs.unwrap_or(360));

(async || self.is_connected_to(other).await)

.retry(ExponentialBuilder::default().with_total_delay(Some(total_delay)))

claude · 2025-08-12T21:52:09Z

Claude finished @theochap's task —— View job

PR Review Complete ✅

I've completed my review of this PR which extends network actor tests with block propagation and reduces test flakiness. Here's my comprehensive feedback:

Todo List:

Read and understand the main networking changes
Review new test infrastructure and block generation logic
Analyze code quality and best practices adherence
Check for potential bugs or issues
Evaluate performance considerations
Assess security concerns
Review test coverage improvements
Provide comprehensive feedback

Review Feedback

✅ Strengths

Excellent Solution to Port Conflicts: The use of kernel-allocated ports (port 0) in bin/node/src/flags/p2p.rs:73-76 and crates/node/service/tests/actors/network/mocks/builder.rs:80 effectively eliminates test flakiness from port conflicts. This is a best practice for testing.
Comprehensive Test Coverage: The new test suite covers multiple important scenarios:
- Simple P2P connections (test_p2p_network_conn)
- Large network topology testing (test_large_network_conn)
- Block propagation from sequencer to validators (test_sequencer_network_conn)
- Full network propagation testing (test_sequencer_network_propagation)
Well-Structured Test Infrastructure: The refactoring from a single mocks.rs file to a proper module structure with mocks/mod.rs and mocks/builder.rs improves maintainability.
Deterministic Block Generation: The SeedGenerator approach provides reproducible test blocks while maintaining randomness between test runs.

⚠️ Issues and Concerns

Code Quality Issues

Excessive Memory Allocation (crates/node/service/tests/actors/generator/block_builder.rs:49):
```
let data = self.random_bytes(1024 * 1024);  // 1MB allocation per block!
```
This allocates 1MB of random data per test block, which is unnecessary and wasteful for testing.
Potential ENR Update Race Condition (crates/node/service/src/actors/network/driver.rs:75):
```
self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);
```
The ENR update happens immediately after getting the gossip address. Consider if this needs proper sequencing or error handling.
Unwrap Usage in Test Code (crates/node/service/tests/actors/network/mocks/builder.rs:44,47):
```
.map_err(|e| anyhow::anyhow!("Impossible...")).unwrap()
```
While acceptable in tests, the error messages suggest these cases are truly impossible, so consider using expect() with clearer messages.

Performance Considerations

Large Test Networks: The test_large_network_conn creates 10 networks and tests full mesh connectivity (O(n²) connections). Consider if this is necessary or if a smaller subset would suffice.
Timeout Configuration: The retry logic in is_connected_to_with_retries uses a 6-minute total delay, which may be excessive for CI environments.

Security Considerations

Test-Only Restrictions: The discovery table filter in builder.rs:70-76 correctly restricts to loopback addresses for testing, preventing external connections.
Proper Key Handling: The keypair generation and address derivation follows secure practices using secp256k1.

🔧 Recommendations

Reduce Block Generation Memory Usage:

// Instead of 1MB, use much smaller test data
let data = self.random_bytes(1024); // 1KB should be sufficient

Add Error Handling for ENR Updates:

if let Err(e) = self.discovery.disc.update_local_enr_socket(...) {
    warn!("Failed to update ENR socket: {}", e);
}

Consider Parameterizing Test Network Size:

const NETWORKS: usize = if cfg!(debug_assertions) { 5 } else { 10 };

Add Timeout Configuration for CI:

let timeout = Duration::from_secs(if env::var("CI").is_ok() { 60 } else { 360 });

✅ Best Practices Followed

✅ Proper separation of concerns with dedicated test modules
✅ Use of kernel-allocated ports for test isolation
✅ Comprehensive error handling with custom error types
✅ Good use of async testing patterns
✅ Proper cleanup with cancellation tokens
✅ Following Rust naming conventions and documentation standards

📋 Test Coverage Assessment

The PR significantly improves test coverage by adding:

P2P connection establishment testing
Block propagation verification
Network topology testing
Sequencer-validator interaction testing

The test infrastructure is well-designed and should catch regressions effectively.

🎯 Overall Assessment

This is a solid PR that addresses real testing flakiness issues while adding valuable test coverage. The kernel-allocated ports solution is excellent, and the new test infrastructure is well-architected. The main concerns are around resource usage optimization and some edge case error handling, but these don't block the functionality.

Recommendation: Approve with minor optimizations suggested above.

…uce test flakiness (op-rs/kona#2693) ## Description This PR achieves the following goals: - Extend the simple p2p connection tests of op-rs/kona#2679 with more comprehensive tests involving block signature and propagation (a follow PR will add the cases for higher block versions, it requires updating the rollup config with hardfork times) - Fixes the network bootstrap logic to remove flakiness. In particular, it uses kernel allocated ports for discovery/gossip to avoid static modification races. - It adds random valid block building logic

…uce test flakiness (op-rs/kona#2693) ## Description This PR achieves the following goals: - Extend the simple p2p connection tests of #2679 with more comprehensive tests involving block signature and propagation (a follow PR will add the cases for higher block versions, it requires updating the rollup config with hardfork times) - Fixes the network bootstrap logic to remove flakiness. In particular, it uses kernel allocated ports for discovery/gossip to avoid static modification races. - It adds random valid block building logic

theochap requested review from clabby, dhyaniarun1993, emhane, itschaindev, refcell and sadiq1971 as code owners August 12, 2025 19:25

theochap self-assigned this Aug 12, 2025

theochap added M-tests Meta: Testing related A-node Area: cl node (eq. Go op-node) handles single-chain consensus A-p2p Area: p2p labels Aug 12, 2025

theochap added this to the [kona-node] Phase 6 - Bay Release milestone Aug 12, 2025

theochap force-pushed the theo/network-actor-test branch from ef94b72 to e766f6e Compare August 12, 2025 19:56

refcell reviewed Aug 12, 2025

View reviewed changes

bin/node/src/commands/bootstore.rs Show resolved Hide resolved

refcell reviewed Aug 12, 2025

View reviewed changes

crates/node/peers/src/store.rs Outdated Show resolved Hide resolved

refcell reviewed Aug 12, 2025

View reviewed changes

crates/node/gossip/src/driver.rs Show resolved Hide resolved

refcell reviewed Aug 12, 2025

View reviewed changes

refcell approved these changes Aug 12, 2025

View reviewed changes

Base automatically changed from theo/network-actor-test to main August 12, 2025 21:28

theochap added 2 commits August 12, 2025 17:32

feat(node/p2p): extend network actor tests

2ae7426

feat(node/p2p): simplify p2p channels to reduce test flakiness

77f5bef

Copilot AI review requested due to automatic review settings August 12, 2025 21:33

theochap force-pushed the theo/kernel-allocated-ports branch from 1c31545 to 77f5bef Compare August 12, 2025 21:33

Copilot AI reviewed Aug 12, 2025

View reviewed changes

chore(node): updating cargo lock

2e142b2

theochap added this pull request to the merge queue Aug 12, 2025

Merged via the queue into main with commit 267c210 Aug 12, 2025
32 of 33 checks passed

theochap deleted the theo/kernel-allocated-ports branch August 12, 2025 22:46

	self.0.fetch_add(1, Ordering::SeqCst)
	self.0.fetch_add(1, Ordering::Relaxed)

-    /// Generate a random op execution payload.
+    /// Generate a random op execution payload for the specified payload version.
+    ///
+    /// This function creates a random, valid `OpExecutionPayloadEnvelope` by constructing a block
+    /// compatible with the given `PayloadVersion`. Each version corresponds to a different set of
+    /// protocol features and header/body fields:
+    ///
+    /// - `V1`: The minimal version, omits withdrawals, blob gas, parent beacon block root, and requests hash.
+    /// - `V2`: Adds withdrawals and the corresponding withdrawals root to the block.
+    /// - `V3`: Adds blob gas fields and parent beacon block root, in addition to withdrawals.
+    /// - `V4`: Currently identical to V3, but reserved for future protocol extensions.
+    ///
+    /// This allows testing and validation of payload handling across different protocol upgrades.

-            .retry(ExponentialBuilder::default().with_total_delay(Some(Duration::from_secs(360))))
+    ///
+    /// The total retry timeout is configurable (default: 360 seconds). A long timeout may be necessary
+    /// in CI or slow environments, but can be reduced for faster feedback during development.
+    pub(super) async fn is_connected_to_with_retries(
+        &self,
+        other: &Self,
+        timeout_secs: Option<u64>,
+    ) -> Result<(), TestNetworkError> {
+        let total_delay = Duration::from_secs(timeout_secs.unwrap_or(360));
+        (async || self.is_connected_to(other).await)
+            .retry(ExponentialBuilder::default().with_total_delay(Some(total_delay)))

Comments

Conversation

theochap commented Aug 12, 2025

Description

Uh oh!

codecov bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

refcell Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

theochap Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

refcell left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Feedback

✅ Strengths

⚠️ Issues and Concerns

Code Quality Issues

Performance Considerations

Security Considerations

🔧 Recommendations

✅ Best Practices Followed

📋 Test Coverage Assessment

🎯 Overall Assessment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Feedback

✅ Strengths

⚠️ Issues and Concerns

Code Quality Issues

Performance Considerations

Security Considerations

🔧 Recommendations

✅ Best Practices Followed

📋 Test Coverage Assessment

🎯 Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Aug 12, 2025 •

edited

Loading

claude bot commented Aug 12, 2025 •

edited

Loading

claude bot commented Aug 12, 2025 •

edited

Loading