Skip to content
This repository was archived by the owner on Jan 16, 2026. It is now read-only.

Comments

feat(node/p2p): extend network actor test with block propagation. reduce test flakiness#2693

Merged
theochap merged 3 commits intomainfrom
theo/kernel-allocated-ports
Aug 12, 2025
Merged

feat(node/p2p): extend network actor test with block propagation. reduce test flakiness#2693
theochap merged 3 commits intomainfrom
theo/kernel-allocated-ports

Conversation

@theochap
Copy link
Member

Description

This PR achieves the following goals:

  • Extend the simple p2p connection tests of feat(net/test): add integration test skeleton #2679 with more comprehensive tests involving block signature and propagation (a follow PR will add the cases for higher block versions, it requires updating the rollup config with hardfork times)
  • Fixes the network bootstrap logic to remove flakiness. In particular, it uses kernel allocated ports for discovery/gossip to avoid static modification races.
  • It adds random valid block building logic

@theochap theochap self-assigned this Aug 12, 2025
@theochap theochap added M-tests Meta: Testing related A-node Area: cl node (eq. Go op-node) handles single-chain consensus A-p2p Area: p2p labels Aug 12, 2025
@codecov
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.7%. Comparing base (6751bfd) to head (2e142b2).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/node/gossip/src/driver.rs 75.0% 1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@theochap theochap force-pushed the theo/network-actor-test branch from ef94b72 to e766f6e Compare August 12, 2025 19:56
Comment on lines +52 to +76
if self.enr_update {
// Update the local ENR socket to the gossip listen address.
// Parse the multiaddr to a socket address.
let ip_address = gossip_listen_addr
.iter()
.find_map(|p| match p {
Protocol::Ip4(ip) => Some(IpAddr::V4(ip)),
Protocol::Ip6(ip) => Some(IpAddr::V6(ip)),
_ => None,
})
.ok_or_else(|| {
NetworkDriverError::InvalidGossipListenAddr(gossip_listen_addr.clone())
})?;
let port = gossip_listen_addr
.iter()
.find_map(|p| match p {
Protocol::Tcp(port) => Some(port),
_ => None,
})
.ok_or_else(|| {
NetworkDriverError::InvalidGossipListenAddr(gossip_listen_addr.clone())
})?;

self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good candidate for a refactor in a separate method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that really increase readability? We're not really doing that much here and we don't really have this pattern elsewhere. I can refactor this out if we ever decide to add this pattern in other parts of the codebase

Copy link
Contributor

@refcell refcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers from my side

Base automatically changed from theo/network-actor-test to main August 12, 2025 21:28
Copilot AI review requested due to automatic review settings August 12, 2025 21:33
@theochap theochap force-pushed the theo/kernel-allocated-ports branch from 1c31545 to 77f5bef Compare August 12, 2025 21:33
@claude
Copy link
Contributor

claude bot commented Aug 12, 2025

Claude finished @theochap's task —— View job


PR Review Complete ✅

I've completed my review of this PR which extends network actor tests with block propagation and reduces test flakiness. Here's my comprehensive feedback:

Todo List:

  • Read and understand the main networking changes
  • Review new test infrastructure and block generation logic
  • Analyze code quality and best practices adherence
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Review test coverage improvements
  • Provide comprehensive feedback

Review Feedback

Strengths

  1. Excellent Solution to Port Conflicts: The use of kernel-allocated ports (port 0) in bin/node/src/flags/p2p.rs:73-76 and crates/node/service/tests/actors/network/mocks/builder.rs:80 effectively eliminates test flakiness from port conflicts. This is a best practice for testing.

  2. Comprehensive Test Coverage: The new test suite covers multiple important scenarios:

    • Simple P2P connections (test_p2p_network_conn)
    • Large network topology testing (test_large_network_conn)
    • Block propagation from sequencer to validators (test_sequencer_network_conn)
    • Full network propagation testing (test_sequencer_network_propagation)
  3. Well-Structured Test Infrastructure: The refactoring from a single mocks.rs file to a proper module structure with mocks/mod.rs and mocks/builder.rs improves maintainability.

  4. Deterministic Block Generation: The SeedGenerator approach provides reproducible test blocks while maintaining randomness between test runs.

⚠️ Issues and Concerns

Code Quality Issues

  1. Excessive Memory Allocation (crates/node/service/tests/actors/generator/block_builder.rs:49):

    let data = self.random_bytes(1024 * 1024);  // 1MB allocation per block!

    This allocates 1MB of random data per test block, which is unnecessary and wasteful for testing.

  2. Potential ENR Update Race Condition (crates/node/service/src/actors/network/driver.rs:75):

    self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);

    The ENR update happens immediately after getting the gossip address. Consider if this needs proper sequencing or error handling.

  3. Unwrap Usage in Test Code (crates/node/service/tests/actors/network/mocks/builder.rs:44,47):

    .map_err(|e| anyhow::anyhow!("Impossible...")).unwrap()

    While acceptable in tests, the error messages suggest these cases are truly impossible, so consider using expect() with clearer messages.

Performance Considerations

  1. Large Test Networks: The test_large_network_conn creates 10 networks and tests full mesh connectivity (O(n²) connections). Consider if this is necessary or if a smaller subset would suffice.

  2. Timeout Configuration: The retry logic in is_connected_to_with_retries uses a 6-minute total delay, which may be excessive for CI environments.

Security Considerations

  1. Test-Only Restrictions: The discovery table filter in builder.rs:70-76 correctly restricts to loopback addresses for testing, preventing external connections.

  2. Proper Key Handling: The keypair generation and address derivation follows secure practices using secp256k1.

🔧 Recommendations

  1. Reduce Block Generation Memory Usage:

    // Instead of 1MB, use much smaller test data
    let data = self.random_bytes(1024); // 1KB should be sufficient
  2. Add Error Handling for ENR Updates:

    if let Err(e) = self.discovery.disc.update_local_enr_socket(...) {
        warn!("Failed to update ENR socket: {}", e);
    }
  3. Consider Parameterizing Test Network Size:

    const NETWORKS: usize = if cfg!(debug_assertions) { 5 } else { 10 };
  4. Add Timeout Configuration for CI:

    let timeout = Duration::from_secs(if env::var("CI").is_ok() { 60 } else { 360 });

Best Practices Followed

  • ✅ Proper separation of concerns with dedicated test modules
  • ✅ Use of kernel-allocated ports for test isolation
  • ✅ Comprehensive error handling with custom error types
  • ✅ Good use of async testing patterns
  • ✅ Proper cleanup with cancellation tokens
  • ✅ Following Rust naming conventions and documentation standards

📋 Test Coverage Assessment

The PR significantly improves test coverage by adding:

  • P2P connection establishment testing
  • Block propagation verification
  • Network topology testing
  • Sequencer-validator interaction testing

The test infrastructure is well-designed and should catch regressions effectively.

🎯 Overall Assessment

This is a solid PR that addresses real testing flakiness issues while adding valuable test coverage. The kernel-allocated ports solution is excellent, and the new test infrastructure is well-architected. The main concerns are around resource usage optimization and some edge case error handling, but these don't block the functionality.

Recommendation: Approve with minor optimizations suggested above.


Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the network actor testing infrastructure with comprehensive block propagation tests and reduces test flakiness by using kernel-allocated ports for discovery/gossip protocols.

  • Adds comprehensive sequencer network tests for block signature and propagation
  • Replaces static port allocation with kernel-allocated ports to avoid race conditions
  • Introduces random valid block building logic for payload generation

Reviewed Changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
examples/gossip/src/main.rs Enables ENR update for gossip command configuration
crates/node/service/tests/actors/utils.rs Removes legacy seed generator utilities (file deleted)
crates/node/service/tests/actors/network/sequencer.rs Adds new sequencer network connectivity and propagation tests
crates/node/service/tests/actors/network/p2p.rs Adds basic P2P network connectivity tests
crates/node/service/tests/actors/network/mod.rs Restructures network test module organization
crates/node/service/tests/actors/network/mocks/mod.rs Creates new test network mock infrastructure
crates/node/service/tests/actors/network/mocks/builder.rs Implements test network builder with configurable sequencer support
crates/node/service/tests/actors/network/mocks.rs Removes legacy test network mock implementation (file deleted)
crates/node/service/tests/actors/mod.rs Updates module structure to include generator utilities
crates/node/service/tests/actors/generator/seed.rs Adds improved seed generator with thread-safe atomic operations
crates/node/service/tests/actors/generator/mod.rs Creates generator module structure
crates/node/service/tests/actors/generator/block_builder.rs Implements random valid block generation for different payload versions
crates/node/service/src/actors/network/driver.rs Updates network driver to support ENR socket updates after gossip startup
crates/node/service/src/actors/network/config.rs Adds ENR update configuration option
crates/node/service/src/actors/network/builder.rs Enhances network builder with ENR update support
crates/node/service/Cargo.toml Adds consensus-related dependencies for test block generation
crates/node/gossip/src/driver.rs Updates gossip driver to return actual listen address after startup

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}

fn next(&self) -> u64 {
self.0.fetch_add(1, Ordering::SeqCst)
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Ordering::SeqCst is unnecessarily strong for a simple counter. Consider using Ordering::Relaxed since the exact ordering of seed generation doesn't require sequential consistency.

Suggested change
self.0.fetch_add(1, Ordering::SeqCst)
self.0.fetch_add(1, Ordering::Relaxed)

Copilot uses AI. Check for mistakes.
}

impl SeedGenerator {
/// Generate a random op execution payload.
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function lacks documentation explaining the purpose and behavior of generating different payload versions. Consider adding a docstring that describes what makes each version valid and how they differ.

Suggested change
/// Generate a random op execution payload.
/// Generate a random op execution payload for the specified payload version.
///
/// This function creates a random, valid `OpExecutionPayloadEnvelope` by constructing a block
/// compatible with the given `PayloadVersion`. Each version corresponds to a different set of
/// protocol features and header/body fields:
///
/// - `V1`: The minimal version, omits withdrawals, blob gas, parent beacon block root, and requests hash.
/// - `V2`: Adds withdrawals and the corresponding withdrawals root to the block.
/// - `V3`: Adds blob gas fields and parent beacon block root, in addition to withdrawals.
/// - `V4`: Currently identical to V3, but reserved for future protocol extensions.
///
/// This allows testing and validation of payload handling across different protocol upgrades.

Copilot uses AI. Check for mistakes.
/// We are setting up a linear network topology, and we check that the block propagates to every
/// block of the network.
#[tokio::test(flavor = "multi_thread")]
async fn test_sequencer_network_propagation() -> anyhow::Result<()> {
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 10 should be documented or made configurable. Consider adding a comment explaining why this specific number of networks is used for the propagation test.

Suggested change
async fn test_sequencer_network_propagation() -> anyhow::Result<()> {
async fn test_sequencer_network_propagation() -> anyhow::Result<()> {
// Use 10 networks to ensure propagation is tested across a reasonably sized network.
// This value is chosen to balance test coverage and execution time.

Copilot uses AI. Check for mistakes.
}

#[tokio::test(flavor = "multi_thread")]
async fn test_large_network_conn() -> anyhow::Result<()> {
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 10 should be documented or made configurable. Consider adding a comment explaining why this specific number of networks is used for the large network test.

Suggested change
async fn test_large_network_conn() -> anyhow::Result<()> {
async fn test_large_network_conn() -> anyhow::Result<()> {
// Use 10 networks to simulate a moderately large P2P network and ensure scalability.
// This value balances test coverage and execution time.

Copilot uses AI. Check for mistakes.
other: &Self,
) -> Result<(), TestNetworkError> {
(async || self.is_connected_to(other).await)
.retry(ExponentialBuilder::default().with_total_delay(Some(Duration::from_secs(360))))
Copy link

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The 360-second timeout is quite long and may slow down tests significantly if connections fail. Consider making this configurable or documenting why such a long timeout is necessary.

Suggested change
.retry(ExponentialBuilder::default().with_total_delay(Some(Duration::from_secs(360))))
///
/// The total retry timeout is configurable (default: 360 seconds). A long timeout may be necessary
/// in CI or slow environments, but can be reduced for faster feedback during development.
pub(super) async fn is_connected_to_with_retries(
&self,
other: &Self,
timeout_secs: Option<u64>,
) -> Result<(), TestNetworkError> {
let total_delay = Duration::from_secs(timeout_secs.unwrap_or(360));
(async || self.is_connected_to(other).await)
.retry(ExponentialBuilder::default().with_total_delay(Some(total_delay)))

Copilot uses AI. Check for mistakes.
@claude
Copy link
Contributor

claude bot commented Aug 12, 2025

Claude finished @theochap's task —— View job


PR Review Complete ✅

I've completed my review of this PR which extends network actor tests with block propagation and reduces test flakiness. Here's my comprehensive feedback:

Todo List:

  • Read and understand the main networking changes
  • Review new test infrastructure and block generation logic
  • Analyze code quality and best practices adherence
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Review test coverage improvements
  • Provide comprehensive feedback

Review Feedback

Strengths

  1. Excellent Solution to Port Conflicts: The use of kernel-allocated ports (port 0) in bin/node/src/flags/p2p.rs:73-76 and crates/node/service/tests/actors/network/mocks/builder.rs:80 effectively eliminates test flakiness from port conflicts. This is a best practice for testing.

  2. Comprehensive Test Coverage: The new test suite covers multiple important scenarios:

    • Simple P2P connections (test_p2p_network_conn)
    • Large network topology testing (test_large_network_conn)
    • Block propagation from sequencer to validators (test_sequencer_network_conn)
    • Full network propagation testing (test_sequencer_network_propagation)
  3. Well-Structured Test Infrastructure: The refactoring from a single mocks.rs file to a proper module structure with mocks/mod.rs and mocks/builder.rs improves maintainability.

  4. Deterministic Block Generation: The SeedGenerator approach provides reproducible test blocks while maintaining randomness between test runs.

⚠️ Issues and Concerns

Code Quality Issues

  1. Excessive Memory Allocation (crates/node/service/tests/actors/generator/block_builder.rs:49):

    let data = self.random_bytes(1024 * 1024);  // 1MB allocation per block!

    This allocates 1MB of random data per test block, which is unnecessary and wasteful for testing.

  2. Potential ENR Update Race Condition (crates/node/service/src/actors/network/driver.rs:75):

    self.discovery.disc.update_local_enr_socket(SocketAddr::new(ip_address, port), true);

    The ENR update happens immediately after getting the gossip address. Consider if this needs proper sequencing or error handling.

  3. Unwrap Usage in Test Code (crates/node/service/tests/actors/network/mocks/builder.rs:44,47):

    .map_err(|e| anyhow::anyhow!("Impossible...")).unwrap()

    While acceptable in tests, the error messages suggest these cases are truly impossible, so consider using expect() with clearer messages.

Performance Considerations

  1. Large Test Networks: The test_large_network_conn creates 10 networks and tests full mesh connectivity (O(n²) connections). Consider if this is necessary or if a smaller subset would suffice.

  2. Timeout Configuration: The retry logic in is_connected_to_with_retries uses a 6-minute total delay, which may be excessive for CI environments.

Security Considerations

  1. Test-Only Restrictions: The discovery table filter in builder.rs:70-76 correctly restricts to loopback addresses for testing, preventing external connections.

  2. Proper Key Handling: The keypair generation and address derivation follows secure practices using secp256k1.

🔧 Recommendations

  1. Reduce Block Generation Memory Usage:

    // Instead of 1MB, use much smaller test data
    let data = self.random_bytes(1024); // 1KB should be sufficient
  2. Add Error Handling for ENR Updates:

    if let Err(e) = self.discovery.disc.update_local_enr_socket(...) {
        warn!("Failed to update ENR socket: {}", e);
    }
  3. Consider Parameterizing Test Network Size:

    const NETWORKS: usize = if cfg!(debug_assertions) { 5 } else { 10 };
  4. Add Timeout Configuration for CI:

    let timeout = Duration::from_secs(if env::var("CI").is_ok() { 60 } else { 360 });

Best Practices Followed

  • ✅ Proper separation of concerns with dedicated test modules
  • ✅ Use of kernel-allocated ports for test isolation
  • ✅ Comprehensive error handling with custom error types
  • ✅ Good use of async testing patterns
  • ✅ Proper cleanup with cancellation tokens
  • ✅ Following Rust naming conventions and documentation standards

📋 Test Coverage Assessment

The PR significantly improves test coverage by adding:

  • P2P connection establishment testing
  • Block propagation verification
  • Network topology testing
  • Sequencer-validator interaction testing

The test infrastructure is well-designed and should catch regressions effectively.

🎯 Overall Assessment

This is a solid PR that addresses real testing flakiness issues while adding valuable test coverage. The kernel-allocated ports solution is excellent, and the new test infrastructure is well-architected. The main concerns are around resource usage optimization and some edge case error handling, but these don't block the functionality.

Recommendation: Approve with minor optimizations suggested above.


@theochap theochap added this pull request to the merge queue Aug 12, 2025
Merged via the queue into main with commit 267c210 Aug 12, 2025
32 of 33 checks passed
@theochap theochap deleted the theo/kernel-allocated-ports branch August 12, 2025 22:46
theochap added a commit to ethereum-optimism/optimism that referenced this pull request Dec 10, 2025
…uce test flakiness (op-rs/kona#2693)

## Description

This PR achieves the following goals:
- Extend the simple p2p connection tests of op-rs/kona#2679 with more
comprehensive tests involving block signature and propagation (a follow
PR will add the cases for higher block versions, it requires updating
the rollup config with hardfork times)
- Fixes the network bootstrap logic to remove flakiness. In particular,
it uses kernel allocated ports for discovery/gossip to avoid static
modification races.
- It adds random valid block building logic
theochap added a commit to ethereum-optimism/optimism that referenced this pull request Jan 14, 2026
…uce test flakiness (op-rs/kona#2693)

## Description

This PR achieves the following goals:
- Extend the simple p2p connection tests of #2679 with more
comprehensive tests involving block signature and propagation (a follow
PR will add the cases for higher block versions, it requires updating
the rollup config with hardfork times)
- Fixes the network bootstrap logic to remove flakiness. In particular,
it uses kernel allocated ports for discovery/gossip to avoid static
modification races.
- It adds random valid block building logic
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A-node Area: cl node (eq. Go op-node) handles single-chain consensus A-p2p Area: p2p M-tests Meta: Testing related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants