Skip to content

fix(p2p): graceful shutdown and DHT routing table fix#26

Merged
mickvandijke merged 7 commits intomainfrom
fix/p2p-shutdown-and-dht-routing
Feb 12, 2026
Merged

fix(p2p): graceful shutdown and DHT routing table fix#26
mickvandijke merged 7 commits intomainfrom
fix/p2p-shutdown-and-dht-routing

Conversation

@mickvandijke
Copy link
Copy Markdown
Collaborator

Summary

  • Graceful shutdown: Replace AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, and TransportHandle for clean, cooperative shutdown. Adds tokio-util dependency. Consolidates scattered shutdown logic and eliminates "channel closed" noise during teardown.
  • DHT routing revert: Revert participation-only routing table promotion (ff30dab) — peers are once again promoted on any valid interaction, not just DHT operations, restoring proper routing table population.

Test plan

  • cargo test — all existing tests pass
  • cargo clippy -- -D warnings -D clippy::unwrap_used -D clippy::expect_used — no new warnings
  • Manual: start a node, confirm clean shutdown without "channel closed" errors
  • Manual: verify routing table populates from non-DHT peer interactions

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 12, 2026 13:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Update P2P shutdown semantics to use cooperative cancellation, and adjust DHT routing table promotion behavior to restore routing population from general peer interactions.

Changes:

  • Replace AtomicBool shutdown flags with tokio_util::sync::CancellationToken across node/transport/DHT components.
  • Refactor shutdown teardown to reduce “channel closed”/join noise and improve task cancellation handling.
  • Rework DHT routing promotion logic (and remove participation-only promotion + its tests).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/network_wiring_e2e_test.rs Updates e2e test to match P2PNode::is_running() API change.
src/transport_handle.rs Converts transport shutdown to CancellationToken and centralizes join/teardown logic.
src/transport/ant_quic_adapter.rs Updates recv task shutdown checks to use CancellationToken.
src/network.rs Replaces running state with a node-level CancellationToken and updates run/stop/is_running.
src/dht_network_manager.rs Switches to CancellationToken and revises peer tracking/routing promotion behavior; removes old promotion tests.
src/dht/core_engine.rs Switches core maintenance shutdown to CancellationToken and select!-based exit.
Cargo.toml Adds tokio-util dependency to support CancellationToken.

Comment thread src/network.rs
Comment on lines +1310 to 1312
pub fn is_running(&self) -> bool {
!self.shutdown.is_cancelled()
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_running() is now derived solely from shutdown.is_cancelled(), but shutdown is initialized as a fresh (non-cancelled) token in new(). That makes a newly constructed node report is_running() == true even before start() is called, which contradicts the semantics used by the existing tests (e.g., test_p2pnode_initialization). Consider reintroducing an explicit running/started state, or storing the shutdown token in a replaceable slot (e.g., RwLock<CancellationToken>) so start() creates a new token and stop() cancels it; then is_running() can reflect “started && not cancelled”.

Copilot uses AI. Check for mistakes.
Comment thread src/transport/ant_quic_adapter.rs Outdated
Comment on lines 273 to 277
loop {
if shutdown.load(Ordering::Relaxed) {
if shutdown.is_cancelled() {
break;
}
match transport.endpoint().recv().await {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recv loop checks shutdown.is_cancelled() only before awaiting endpoint().recv().await, so shutdown may block indefinitely if no packets arrive. Use tokio::select! to await either shutdown.cancelled() or transport.endpoint().recv() so the task can exit promptly on shutdown.

Copilot uses AI. Check for mistakes.
Comment on lines +2169 to 2171
/// Update peer information
async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {
let dht_key = crate::dht::derive_dht_key_from_peer_id(&peer_id);
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_peer_info() no longer promotes peers into the routing table at all (the previous promotion block was removed), but it still populates DhtPeerInfo with addresses. Combined with the current handle_peer_connected() logic (which only adds to the routing table when addresses are newly discovered), a peer first seen via DHT traffic can end up never being added to the routing table because it already has addresses by the time it connects. To restore “promote on any valid interaction,” either (a) re-add routing table insertion in update_peer_info() when an address is available, or (b) make handle_peer_connected() attempt add_node whenever the peer is connected and has an address (ideally with an explicit ‘already added’ guard or an idempotent/‘already present’ handling path).

Copilot uses AI. Check for mistakes.
Comment on lines +2235 to +2248
// Track peer and decide whether it should be promoted to routing table.
let should_add_to_routing = {
let mut peers = self.dht_peers.write().await;
match peers.entry(peer_id.clone()) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
let peer_info = entry.get_mut();
let had_addresses = !peer_info.addresses.is_empty();
peer_info.last_seen = Instant::now();
peer_info.is_connected = true;
if !addresses.is_empty() {
peer_info.addresses = addresses.clone();
}
!addresses.is_empty() && !had_addresses
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should_add_to_routing heuristic is based on “addresses became non-empty,” which conflates ‘has enough info to add’ with ‘not yet added’. Since in_routing_table was removed, there’s no durable way to prevent duplicate adds or to retry after a prior add_node failure, and there’s also a risk of never adding peers that already had addresses from earlier tracking. Consider restoring an explicit routing-membership flag (or querying the core engine for presence) so the decision is based on actual routing table state rather than address transitions.

Copilot uses AI. Check for mistakes.
@@ -2713,69 +2686,3 @@ impl Default for DhtNetworkConfig {
}
}
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes the existing unit tests that exercised routing promotion behavior (the deleted should_promote_to_routing tests), but the routing promotion logic is still being modified substantially. Add new tests that verify the intended behavior after the revert: peers are promoted on non-DHT interactions / general connectivity (e.g., via PeerConnected handling) and that PeerDiscovered is emitted appropriately.

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Replaces AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, TransportHandle, and background tasks for cooperative shutdown. Reverts participation-only routing table promotion—peers are now added to DHT routing on any connection rather than only on DHT operations.

Major changes:

  • Added tokio-util dependency for CancellationToken support
  • Refactored shutdown coordination using tokio::select! in all background task loops
  • Simplified is_running() to synchronous check of cancellation token
  • Removed in_routing_table field from DhtPeerInfo struct
  • Deleted unit tests for participation-only routing promotion logic
  • Consolidated task join logic with helper methods in TransportHandle

Issues found:

  • Race condition in spawn_recv_task (src/transport/ant_quic_adapter.rs:273): shutdown check happens before blocking recv() call, preventing immediate cancellation
  • Potential duplicate routing table additions in handle_peer_connected when peer addresses change between connections

Confidence Score: 3/5

  • PR improves shutdown handling but contains two logical issues that need resolution before merge
  • The CancellationToken refactoring is well-executed across most files, but the recv_task in ant_quic_adapter has a race condition that could delay shutdown, and the DHT routing promotion logic may cause duplicate adds. The removal of unit tests without replacement reduces confidence in the routing table behavior changes.
  • Pay close attention to src/transport/ant_quic_adapter.rs (race condition) and src/dht_network_manager.rs (routing table promotion logic)

Important Files Changed

Filename Overview
src/dht_network_manager.rs Major changes: replaced AtomicBool with CancellationToken, reverted participation-only routing promotion (peers now promoted on any connection), removed unit tests. Potential duplicate routing table additions.
src/network.rs Replaced RwLock<bool> with CancellationToken for shutdown, made is_running() synchronous, updated run loop to use tokio::select!
src/transport/ant_quic_adapter.rs Updated spawn_recv_task to accept CancellationToken; race condition where shutdown check before blocking recv() won't exit until next message
src/transport_handle.rs Replaced AtomicBool with CancellationToken, refactored shutdown with helper methods for joining tasks, updated all background tasks to use tokio::select!

Sequence Diagram

sequenceDiagram
    participant User
    participant P2PNode
    participant DhtNetworkManager
    participant TransportHandle
    participant BackgroundTasks
    
    User->>P2PNode: stop()
    P2PNode->>P2PNode: shutdown.cancel()
    Note over P2PNode: Signals run loop to exit
    
    P2PNode->>DhtNetworkManager: stop()
    DhtNetworkManager->>DhtNetworkManager: shutdown.cancel()
    DhtNetworkManager->>DhtNetworkManager: dht.signal_shutdown()
    Note over DhtNetworkManager: Cancels all DHT background tasks
    
    P2PNode->>TransportHandle: stop()
    TransportHandle->>TransportHandle: shutdown.cancel()
    TransportHandle->>TransportHandle: dual_node.shutdown_endpoints()
    
    par Join all background tasks
        TransportHandle->>BackgroundTasks: join recv tasks
        TransportHandle->>BackgroundTasks: join listener task
        TransportHandle->>BackgroundTasks: join connection monitor
        TransportHandle->>BackgroundTasks: join keepalive task
        TransportHandle->>BackgroundTasks: join maintenance task
    end
    
    BackgroundTasks-->>TransportHandle: Tasks complete via tokio::select!
    
    TransportHandle->>TransportHandle: disconnect_all_peers()
    TransportHandle-->>P2PNode: Transport stopped
    
    P2PNode-->>User: Node stopped gracefully
Loading

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread src/transport/ant_quic_adapter.rs Outdated
Comment on lines 273 to 276
loop {
if shutdown.load(Ordering::Relaxed) {
if shutdown.is_cancelled() {
break;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: shutdown check happens before recv(), but recv() can block indefinitely. If cancellation occurs while waiting in recv(), the task won't exit until the next message arrives.

Use tokio::select! to properly handle cancellation:

Suggested change
loop {
if shutdown.load(Ordering::Relaxed) {
if shutdown.is_cancelled() {
break;
}
loop {
tokio::select! {
() = shutdown.cancelled() => {
break;
}
result = transport.endpoint().recv() => {
match result {

Comment on lines +2240 to +2247
let peer_info = entry.get_mut();
let had_addresses = !peer_info.addresses.is_empty();
peer_info.last_seen = Instant::now();
peer_info.is_connected = true;
if !addresses.is_empty() {
peer_info.addresses = addresses.clone();
}
!addresses.is_empty() && !had_addresses
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic issue with routing promotion: if a peer connects multiple times or if addresses changes from non-empty to a different non-empty list, had_addresses will be true but the condition !addresses.is_empty() && !had_addresses will be false, preventing re-promotion. However, if addresses change (e.g., peer reconnects from different IP), we'll attempt to add the peer again on line 2292, which may cause duplicate entries or errors.

Consider tracking whether the peer is already in the routing table to avoid duplicate add_node calls.

mickvandijke and others added 2 commits February 12, 2026 14:36
…hutdown

Replace check-then-await pattern with tokio::select! to race the
cancellation token against recv(), eliminating a race condition where
shutdown could be missed between the is_cancelled() check and blocking
on recv().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 12, 2026 13:54
@mickvandijke
Copy link
Copy Markdown
Collaborator Author

@greptile-apps @claude please re-review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

/// so short-lived request clients are not inserted as DHT routing peers.
async fn update_peer_info(&self, peer_id: PeerId, message: &DhtNetworkMessage) {
/// Update peer information
async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter _message is prefixed with an underscore but never used in the function body. Since this parameter was previously used to determine routing promotion but is no longer needed after the revert, it should be removed from the function signature entirely.

Suggested change
async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {
async fn update_peer_info(&self, peer_id: PeerId, _: &DhtNetworkMessage) {

Copilot uses AI. Check for mistakes.
avg_latency: Duration::from_millis(50),
reliability_score: 1.0,
});
!addresses.is_empty()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic returns true when a new peer is added with addresses, but on line 2247 it returns true only when addresses were empty before and are now non-empty (!addresses.is_empty() && !had_addresses). This inconsistency means new peers will always be promoted even if they shouldn't be, while existing peers will only be promoted when transitioning from no addresses to having addresses. The logic should be consistent: either both cases should check !had_addresses or both should simply check !addresses.is_empty().

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Replaced scattered AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, TransportHandle, and background tasks for cooperative shutdown coordination. Reverted participation-only routing table promotion (commit ff30dab) — peers are now promoted to the DHT routing table on any connection event, not just DHT participation operations like Put/Join.

Key improvements:

  • All background tasks now use tokio::select! to respond immediately to cancellation instead of polling shutdown flags
  • Eliminates "channel closed" errors during teardown by ensuring proper task coordination
  • Simplified is_running() from async to sync method since it only checks token state
  • Reduced code duplication with helper methods for task cleanup in transport_handle.rs:1033-1057

DHT routing revert:

  • Removed in_routing_table field from DhtPeerInfo
  • Removed should_promote_to_routing() message filtering
  • Peers now added to routing table in handle_peer_connected() on first connection with valid addresses

Confidence Score: 4/5

  • Safe to merge with minor concern about duplicate routing table entries
  • Shutdown refactor is well-implemented and follows best practices. The DHT routing table revert restores proper peer discovery, but the logic in handle_peer_connected() at src/dht_network_manager.rs:2247 may allow duplicate add_node calls if peer addresses change during reconnection (see existing thread comment)
  • src/dht_network_manager.rs - review the routing table promotion logic around line 2247

Important Files Changed

Filename Overview
src/network.rs Replaced RwLock<bool> with CancellationToken for shutdown signaling; simplified is_running() to non-async; refactored run loop to use tokio::select! for clean cancellation
src/dht_network_manager.rs Replaced AtomicBool with CancellationToken; reverted participation-only routing (peers now promoted on any connection); removed in_routing_table field from DhtPeerInfo; refactored event handler to use tokio::select!
src/transport_handle.rs Replaced AtomicBool with CancellationToken for shutdown coordination; refactored all background tasks (keepalive, periodic maintenance, connection monitor) to use tokio::select!; added helper methods to reduce code duplication in task cleanup
src/transport/ant_quic_adapter.rs Updated spawn_recv_task to accept CancellationToken instead of AtomicBool; implemented proper cancellation with tokio::select! to handle shutdown immediately during blocking recv()

Sequence Diagram

sequenceDiagram
    participant User
    participant P2PNode
    participant DhtNetworkManager
    participant TransportHandle
    participant AntQuicAdapter
    participant BackgroundTasks

    User->>P2PNode: stop()
    activate P2PNode
    P2PNode->>P2PNode: shutdown.cancel()
    Note over P2PNode: CancellationToken signals shutdown
    
    P2PNode->>DhtNetworkManager: stop()
    activate DhtNetworkManager
    DhtNetworkManager->>DhtNetworkManager: shutdown.cancel()
    DhtNetworkManager->>BackgroundTasks: select! cancelled()
    BackgroundTasks-->>DhtNetworkManager: tasks exit cleanly
    DhtNetworkManager-->>P2PNode: Ok
    deactivate DhtNetworkManager
    
    P2PNode->>TransportHandle: stop()
    activate TransportHandle
    TransportHandle->>TransportHandle: shutdown.cancel()
    TransportHandle->>BackgroundTasks: select! cancelled()
    BackgroundTasks-->>TransportHandle: tasks exit cleanly
    TransportHandle->>AntQuicAdapter: shutdown_endpoints()
    activate AntQuicAdapter
    AntQuicAdapter->>BackgroundTasks: recv tasks via select!
    BackgroundTasks-->>AntQuicAdapter: exit immediately
    AntQuicAdapter-->>TransportHandle: Ok
    deactivate AntQuicAdapter
    TransportHandle-->>P2PNode: Ok
    deactivate TransportHandle
    
    P2PNode-->>User: Ok
    deactivate P2PNode
    Note over User,BackgroundTasks: No "channel closed" errors
Loading

@claude
Copy link
Copy Markdown

claude bot commented Feb 12, 2026

Claude finished @mickvandijke's task in 2m 30s —— View job


Re-review of PR #26

Tasks:

  • Read all changed files
  • Analyze CancellationToken shutdown implementation
  • Check if previous review issues were addressed
  • Review DHT routing table promotion logic
  • Provide comprehensive feedback

Summary

This PR makes good progress on improving shutdown semantics with the CancellationToken refactoring, but the DHT routing table promotion logic has issues that need to be addressed before merging.


✅ What's Working Well

1. CancellationToken Migration

The refactoring from AtomicBool to CancellationToken is well-executed:

  • src/transport/ant_quic_adapter.rs:273-299: ✅ FIXED - Now properly uses tokio::select! to race shutdown against recv(), eliminating the race condition identified in previous reviews
  • src/network.rs: ✅ Clean implementation with tokio::select! in run loop
  • src/dht/core_engine.rs: ✅ Proper tokio::select! in maintenance tasks
  • src/transport_handle.rs: ✅ Good helper methods for joining tasks
  • src/dht_network_manager.rs: ✅ Event handler properly uses tokio::select!

2. Code Quality

  • No uses of .unwrap(), .expect(), or panic!()
  • Proper error handling with Result types ✅
  • Good use of tracing instead of println!
  • Consistent shutdown patterns across all background tasks ✅

🚨 Issues Found

Issue #1: Inconsistent Routing Table Promotion Logic ⚠️

Location: src/dht_network_manager.rs:2236-2262

The logic for deciding when to add peers to the routing table is inconsistent:

For existing peers (line 2247):

!addresses.is_empty() && !had_addresses  // Only on address transition

For new peers (line 2259):

!addresses.is_empty()  // Always if addresses present

Problems:

  1. New peers: Added to routing table immediately if they have addresses ✅
  2. Existing peers: Only added when transitioning from "no addresses" → "has addresses" ❌

Scenario that fails:

1. Peer connects, DhtPeerInfo created with addresses from transport
2. handle_peer_connected() called
3. Peer already has addresses, so had_addresses = true
4. Condition fails: !addresses.is_empty() && !had_addresses = true && false = false
5. Peer is NEVER added to routing table despite being connected!

This contradicts the PR's goal: "peers are once again promoted on any valid interaction"

Recommendation: Make routing promotion consistent. Either:

Option A - Always promote connected peers with addresses:

// For both new and existing peers
if !addresses.is_empty() {
    should_add_to_routing = true;
}

Option B - Track routing table membership explicitly:

// Add to DhtPeerInfo
in_routing_table: bool

// Then check:
!addresses.is_empty() && !peer_info.in_routing_table

Issue #2: Unused Parameter ⚠️

Location: src/dht_network_manager.rs:2170

async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {

The _message parameter is unused after removing the participation-only promotion logic.

Recommendation: Remove the parameter and update all call sites:

async fn update_peer_info(&self, peer_id: PeerId) {

Issue #3: Missing Test Coverage ⚠️

Location: The PR removes old routing promotion tests but doesn't add new ones

Per review comment at src/dht_network_manager.rs:2688, the PR deleted unit tests for the participation-only logic but didn't add tests for the reverted behavior.

Needed tests:

#[tokio::test]
async fn test_peer_promoted_on_connection() {
    // Verify peers are added to routing table on PeerConnected
}

#[tokio::test]
async fn test_peer_with_initial_addresses_promoted() {
    // Verify peers that have addresses from the start are still promoted
}

#[tokio::test]
async fn test_peer_discovered_event_emitted() {
    // Verify PeerDiscovered events work correctly
}

📋 Detailed File Review

src/transport/ant_quic_adapter.rs

  • Lines 273-299: ✅ spawn_recv_task now properly uses tokio::select! - race condition FIXED
  • Overall: Clean implementation, no issues

src/network.rs

  • Line 1311: ✅ is_running() correctly uses !shutdown.is_cancelled()
  • Overall: Good refactoring, shutdown semantics are clear

src/dht_network_manager.rs

  • Line 2170: ⚠️ Unused _message parameter should be removed
  • Lines 2236-2262: ⚠️ Routing promotion logic is inconsistent (see Issue style: Fix formatting issues to resolve CI failures #1)
  • Line 2247: ❌ Existing peers with addresses won't be promoted
  • Line 2688: ⚠️ Missing test coverage for new behavior
  • Overall: Shutdown refactoring is good, but routing logic needs fixes

src/transport_handle.rs

  • Overall: ✅ Excellent refactoring with good helper methods for task management

src/dht/core_engine.rs

  • Overall: ✅ Clean use of CancellationToken in maintenance tasks

🎯 Recommendations Before Merge

Must Fix:

  1. Fix routing promotion logic to match stated goals - peers should be promoted on connection, not just address transitions
  2. Add test coverage for the reverted routing promotion behavior

Should Fix:

  1. Remove unused _message parameter from update_peer_info()

💡 Suggested Fix for Issue #1

Replace lines 2236-2262 with:

// Track peer and update addresses
{
    let mut peers = self.dht_peers.write().await;
    match peers.entry(peer_id.clone()) {
        std::collections::hash_map::Entry::Occupied(mut entry) => {
            let peer_info = entry.get_mut();
            peer_info.last_seen = Instant::now();
            peer_info.is_connected = true;
            if !addresses.is_empty() {
                peer_info.addresses = addresses.clone();
            }
        }
        std::collections::hash_map::Entry::Vacant(entry) => {
            entry.insert(DhtPeerInfo {
                peer_id: peer_id.clone(),
                dht_key,
                addresses: addresses.clone(),
                last_seen: Instant::now(),
                is_connected: true,
                avg_latency: Duration::from_millis(50),
                reliability_score: 1.0,
            });
        }
    }
}

// Skip peers with no addresses - they cannot be used for DHT routing.
let address_str = match addresses.first() {
    Some(addr) => addr.to_string(),
    None => {
        warn!("Peer {} has no addresses, skipping DHT routing table addition", peer_id);
        return;
    }
};

// Always try to add connected peers to routing table (idempotent operation)
{
    use crate::dht::core_engine::{NodeCapacity, NodeId, NodeInfo};

    let node_info = NodeInfo {
        id: NodeId::from_bytes(dht_key),
        address: address_str,
        last_seen: SystemTime::now(),
        capacity: NodeCapacity::default(),
    };

    match self.dht.write().await.add_node(node_info).await {
        Ok(_) => info!("Added/updated peer {} in DHT routing table", peer_id),
        Err(e) => warn!("Failed to add peer {} to DHT routing table: {}", peer_id, e),
    }
}

This ensures all connected peers with addresses are promoted to the routing table, matching the PR's stated goal.


📊 Overall Assessment

Strengths:

  • ✅ Shutdown refactoring is well-done and consistent
  • ✅ Code quality is high
  • ✅ Race condition in spawn_recv_task is fixed

Weaknesses:

  • ❌ Routing table promotion logic doesn't match stated goals
  • ❌ Missing test coverage for reverted behavior
  • ⚠️ Minor cleanup needed (unused parameter)

Verdict: Request changes - The shutdown refactoring is excellent, but the DHT routing promotion logic needs to be fixed to properly promote all connected peers, not just those transitioning from no-addresses to has-addresses.


| Branch: fix/p2p-shutdown-and-dht-routing

mickvandijke and others added 2 commits February 12, 2026 15:11
…rt()

is_running() only checked !shutdown.is_cancelled(), which is true for a
freshly constructed CancellationToken. Add an is_started AtomicBool that
is set on start() and cleared on stop() so the method correctly reports
false before the node is started.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 12, 2026 14:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread src/network.rs
Comment on lines 1286 to 1293
pub async fn stop(&self) -> Result<()> {
info!("Stopping P2P node...");

// Set running state to false
*self.running.write().await = false;
// Signal the run loop to exit
self.shutdown.cancel();

// Stop DHT manager first so leave messages can be sent while transport is still active.
self.dht_manager.stop().await?;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancellationToken cannot be “uncancelled”, but P2PNode stores a single token created in new() and cancels it in stop(). After the first stop, is_running() will always return false and run() will immediately exit even if start() is called again (since start() does not create a fresh token). If restart is intended, store the token behind interior mutability (e.g., RwLock<CancellationToken> / Mutex<CancellationToken> / ArcSwap) and replace it with a new token in start() (or create a per-run child token). If restart is not intended, consider making that explicit by returning an error on subsequent start() calls (or redesigning the API so stop() consumes the instance).

Copilot uses AI. Check for mistakes.
Comment thread src/network.rs
Comment on lines +1320 to 1322
pub fn is_running(&self) -> bool {
self.is_started.load(std::sync::atomic::Ordering::Acquire) && !self.shutdown.is_cancelled()
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancellationToken cannot be “uncancelled”, but P2PNode stores a single token created in new() and cancels it in stop(). After the first stop, is_running() will always return false and run() will immediately exit even if start() is called again (since start() does not create a fresh token). If restart is intended, store the token behind interior mutability (e.g., RwLock<CancellationToken> / Mutex<CancellationToken> / ArcSwap) and replace it with a new token in start() (or create a per-run child token). If restart is not intended, consider making that explicit by returning an error on subsequent start() calls (or redesigning the API so stop() consumes the instance).

Copilot uses AI. Check for mistakes.
Comment on lines +2235 to +2279
// Track peer and decide whether it should be promoted to routing table.
let should_add_to_routing = {
let mut peers = self.dht_peers.write().await;
match peers.entry(peer_id.clone()) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
let peer_info = entry.get_mut();
let had_addresses = !peer_info.addresses.is_empty();
peer_info.last_seen = Instant::now();
peer_info.is_connected = true;
if !addresses.is_empty() {
peer_info.addresses = addresses.clone();
}
!addresses.is_empty() && !had_addresses
}
std::collections::hash_map::Entry::Vacant(entry) => {
entry.insert(DhtPeerInfo {
peer_id: peer_id.clone(),
dht_key,
addresses: addresses.clone(),
last_seen: Instant::now(),
is_connected: true,
avg_latency: Duration::from_millis(50),
reliability_score: 1.0,
});
!addresses.is_empty()
}
}
std::collections::hash_map::Entry::Vacant(entry) => {
entry.insert(DhtPeerInfo {
peer_id: peer_id.clone(),
dht_key,
addresses,
last_seen: Instant::now(),
is_connected: true,
avg_latency: Duration::from_millis(50),
reliability_score: 1.0,
in_routing_table: false,
});
};

// Skip peers with no addresses - they cannot be used for DHT routing.
let address_str = match addresses.first() {
Some(addr) => addr.to_string(),
None => {
warn!(
"Peer {} has no addresses, skipping DHT routing table addition",
peer_id
);
return;
}
};

debug!(
"Peer {} tracked for DHT liveness; routing promotion deferred until non-lookup DHT participation",
peer_id
);
}
if !should_add_to_routing {
debug!("Peer {} already tracked in DHT routing state", peer_id);
return;
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This promotion logic can permanently skip adding a peer to the DHT routing table in common orderings. Example: update_peer_info() may insert/update a peer with non-empty peer_info.addresses before handle_peer_connected() runs; then had_addresses becomes true and should_add_to_routing becomes false, and the peer is never added (since update_peer_info() no longer performs the add_node call). Additionally, the code warns + returns on missing addresses.first() before checking should_add_to_routing, which can create noisy warnings and early returns even when no routing-table add is intended. Consider gating the address requirement and warning behind should_add_to_routing, and determine “already in routing table” via an explicit flag (reintroduce in_routing_table) or by making add_node idempotent (attempt to add whenever an address is available and treat “already exists” as success).

Copilot uses AI. Check for mistakes.
Comment on lines +2281 to +2303
// Add to DHT routing table.
{
use crate::dht::core_engine::{NodeCapacity, NodeId, NodeInfo};

let node_info = NodeInfo {
id: NodeId::from_bytes(dht_key),
address: address_str,
last_seen: SystemTime::now(),
capacity: NodeCapacity::default(),
};

if let Err(e) = self.dht.write().await.add_node(node_info).await {
warn!("Failed to add peer {} to DHT routing table: {}", peer_id, e);
} else {
info!("Added peer {} to DHT routing table", peer_id);
}
}

/// Whether an inbound DHT message should promote the sender to routing.
fn should_promote_to_routing(message: &DhtNetworkMessage) -> bool {
matches!(&message.message_type, DhtMessageType::Request)
&& matches!(
&message.payload,
DhtNetworkOperation::Put { .. } | DhtNetworkOperation::Join
)
if self.event_tx.receiver_count() > 0 {
let _ = self
.event_tx
.send(DhtNetworkEvent::PeerDiscovered { peer_id, dht_key });
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routing promotion behavior was substantially changed and the prior unit tests around routing promotion were removed. Add tests that cover the new expected behavior (e.g., a peer gets added to the routing table and PeerDiscovered fires exactly once), including the important ordering where peer metadata is learned/updated before the PeerConnected event (and vice versa), so routing table population stays reliable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@dirvine dirvine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go !!!

@mickvandijke mickvandijke merged commit 51a2c95 into main Feb 12, 2026
19 checks passed
@mickvandijke mickvandijke deleted the fix/p2p-shutdown-and-dht-routing branch February 12, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants