fix(p2p): graceful shutdown and DHT routing table fix by mickvandijke · Pull Request #26 · saorsa-labs/saorsa-core

mickvandijke · 2026-02-12T13:09:38Z

Summary

Graceful shutdown: Replace AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, and TransportHandle for clean, cooperative shutdown. Adds tokio-util dependency. Consolidates scattered shutdown logic and eliminates "channel closed" noise during teardown.
DHT routing revert: Revert participation-only routing table promotion (ff30dab) — peers are once again promoted on any valid interaction, not just DHT operations, restoring proper routing table population.

Test plan

cargo test — all existing tests pass
cargo clippy -- -D warnings -D clippy::unwrap_used -D clippy::expect_used — no new warnings
Manual: start a node, confirm clean shutdown without "channel closed" errors
Manual: verify routing table populates from non-DHT peer interactions

🤖 Generated with Claude Code

Copilot

Pull request overview

Update P2P shutdown semantics to use cooperative cancellation, and adjust DHT routing table promotion behavior to restore routing population from general peer interactions.

Changes:

Replace AtomicBool shutdown flags with tokio_util::sync::CancellationToken across node/transport/DHT components.
Refactor shutdown teardown to reduce “channel closed”/join noise and improve task cancellation handling.
Rework DHT routing promotion logic (and remove participation-only promotion + its tests).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/network_wiring_e2e_test.rs	Updates e2e test to match `P2PNode::is_running()` API change.
src/transport_handle.rs	Converts transport shutdown to `CancellationToken` and centralizes join/teardown logic.
src/transport/ant_quic_adapter.rs	Updates recv task shutdown checks to use `CancellationToken`.
src/network.rs	Replaces `running` state with a node-level `CancellationToken` and updates run/stop/is_running.
src/dht_network_manager.rs	Switches to `CancellationToken` and revises peer tracking/routing promotion behavior; removes old promotion tests.
src/dht/core_engine.rs	Switches core maintenance shutdown to `CancellationToken` and `select!`-based exit.
Cargo.toml	Adds `tokio-util` dependency to support `CancellationToken`.

Copilot · 2026-02-12T13:11:40Z

+    pub fn is_running(&self) -> bool {
+        !self.shutdown.is_cancelled()
    }


is_running() is now derived solely from shutdown.is_cancelled(), but shutdown is initialized as a fresh (non-cancelled) token in new(). That makes a newly constructed node report is_running() == true even before start() is called, which contradicts the semantics used by the existing tests (e.g., test_p2pnode_initialization). Consider reintroducing an explicit running/started state, or storing the shutdown token in a replaceable slot (e.g., RwLock<CancellationToken>) so start() creates a new token and stop() cancels it; then is_running() can reflect “started && not cancelled”.

Copilot · 2026-02-12T13:11:41Z

            loop {
-                if shutdown.load(Ordering::Relaxed) {
+                if shutdown.is_cancelled() {
                    break;
                }
                match transport.endpoint().recv().await {


The recv loop checks shutdown.is_cancelled() only before awaiting endpoint().recv().await, so shutdown may block indefinitely if no packets arrive. Use tokio::select! to await either shutdown.cancelled() or transport.endpoint().recv() so the task can exit promptly on shutdown.

Copilot · 2026-02-12T13:11:41Z

+    /// Update peer information
+    async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {
        let dht_key = crate::dht::derive_dht_key_from_peer_id(&peer_id);


update_peer_info() no longer promotes peers into the routing table at all (the previous promotion block was removed), but it still populates DhtPeerInfo with addresses. Combined with the current handle_peer_connected() logic (which only adds to the routing table when addresses are newly discovered), a peer first seen via DHT traffic can end up never being added to the routing table because it already has addresses by the time it connects. To restore “promote on any valid interaction,” either (a) re-add routing table insertion in update_peer_info() when an address is available, or (b) make handle_peer_connected() attempt add_node whenever the peer is connected and has an address (ideally with an explicit ‘already added’ guard or an idempotent/‘already present’ handling path).

Copilot · 2026-02-12T13:11:41Z

+        // Track peer and decide whether it should be promoted to routing table.
+        let should_add_to_routing = {
+            let mut peers = self.dht_peers.write().await;
+            match peers.entry(peer_id.clone()) {
+                std::collections::hash_map::Entry::Occupied(mut entry) => {
+                    let peer_info = entry.get_mut();
+                    let had_addresses = !peer_info.addresses.is_empty();
+                    peer_info.last_seen = Instant::now();
+                    peer_info.is_connected = true;
+                    if !addresses.is_empty() {
+                        peer_info.addresses = addresses.clone();
+                    }
+                    !addresses.is_empty() && !had_addresses
+                }


The should_add_to_routing heuristic is based on “addresses became non-empty,” which conflates ‘has enough info to add’ with ‘not yet added’. Since in_routing_table was removed, there’s no durable way to prevent duplicate adds or to retry after a prior add_node failure, and there’s also a risk of never adding peers that already had addresses from earlier tracking. Consider restoring an explicit routing-membership flag (or querying the core engine for presence) so the decision is based on actual routing table state rather than address transitions.

Copilot · 2026-02-12T13:11:42Z

@@ -2713,69 +2686,3 @@ impl Default for DhtNetworkConfig {
        }
    }
 }


This PR removes the existing unit tests that exercised routing promotion behavior (the deleted should_promote_to_routing tests), but the routing promotion logic is still being modified substantially. Add new tests that verify the intended behavior after the revert: peers are promoted on non-DHT interactions / general connectivity (e.g., via PeerConnected handling) and that PeerDiscovered is emitted appropriately.

greptile-apps · 2026-02-12T13:12:39Z

Greptile Overview

Greptile Summary

Replaces AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, TransportHandle, and background tasks for cooperative shutdown. Reverts participation-only routing table promotion—peers are now added to DHT routing on any connection rather than only on DHT operations.

Major changes:

Added tokio-util dependency for CancellationToken support
Refactored shutdown coordination using tokio::select! in all background task loops
Simplified is_running() to synchronous check of cancellation token
Removed in_routing_table field from DhtPeerInfo struct
Deleted unit tests for participation-only routing promotion logic
Consolidated task join logic with helper methods in TransportHandle

Issues found:

Race condition in spawn_recv_task (src/transport/ant_quic_adapter.rs:273): shutdown check happens before blocking recv() call, preventing immediate cancellation
Potential duplicate routing table additions in handle_peer_connected when peer addresses change between connections

Confidence Score: 3/5

PR improves shutdown handling but contains two logical issues that need resolution before merge
The CancellationToken refactoring is well-executed across most files, but the recv_task in ant_quic_adapter has a race condition that could delay shutdown, and the DHT routing promotion logic may cause duplicate adds. The removal of unit tests without replacement reduces confidence in the routing table behavior changes.
Pay close attention to src/transport/ant_quic_adapter.rs (race condition) and src/dht_network_manager.rs (routing table promotion logic)

Important Files Changed

Filename	Overview
src/dht_network_manager.rs	Major changes: replaced `AtomicBool` with `CancellationToken`, reverted participation-only routing promotion (peers now promoted on any connection), removed unit tests. Potential duplicate routing table additions.
src/network.rs	Replaced `RwLock<bool>` with `CancellationToken` for shutdown, made `is_running()` synchronous, updated run loop to use `tokio::select!`
src/transport/ant_quic_adapter.rs	Updated `spawn_recv_task` to accept `CancellationToken`; race condition where shutdown check before blocking `recv()` won't exit until next message
src/transport_handle.rs	Replaced `AtomicBool` with `CancellationToken`, refactored shutdown with helper methods for joining tasks, updated all background tasks to use `tokio::select!`

Sequence Diagram

sequenceDiagram
    participant User
    participant P2PNode
    participant DhtNetworkManager
    participant TransportHandle
    participant BackgroundTasks
    
    User->>P2PNode: stop()
    P2PNode->>P2PNode: shutdown.cancel()
    Note over P2PNode: Signals run loop to exit
    
    P2PNode->>DhtNetworkManager: stop()
    DhtNetworkManager->>DhtNetworkManager: shutdown.cancel()
    DhtNetworkManager->>DhtNetworkManager: dht.signal_shutdown()
    Note over DhtNetworkManager: Cancels all DHT background tasks
    
    P2PNode->>TransportHandle: stop()
    TransportHandle->>TransportHandle: shutdown.cancel()
    TransportHandle->>TransportHandle: dual_node.shutdown_endpoints()
    
    par Join all background tasks
        TransportHandle->>BackgroundTasks: join recv tasks
        TransportHandle->>BackgroundTasks: join listener task
        TransportHandle->>BackgroundTasks: join connection monitor
        TransportHandle->>BackgroundTasks: join keepalive task
        TransportHandle->>BackgroundTasks: join maintenance task
    end
    
    BackgroundTasks-->>TransportHandle: Tasks complete via tokio::select!
    
    TransportHandle->>TransportHandle: disconnect_all_peers()
    TransportHandle-->>P2PNode: Transport stopped
    
    P2PNode-->>User: Node stopped gracefully

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T13:12:43Z

            loop {
-                if shutdown.load(Ordering::Relaxed) {
+                if shutdown.is_cancelled() {
                    break;
                }


Race condition: shutdown check happens before recv(), but recv() can block indefinitely. If cancellation occurs while waiting in recv(), the task won't exit until the next message arrives.

Use tokio::select! to properly handle cancellation:

Suggested change

loop {

if shutdown.load(Ordering::Relaxed) {

if shutdown.is_cancelled() {

break;

}

loop {

tokio::select! {

() = shutdown.cancelled() => {

break;

}

result = transport.endpoint().recv() => {

match result {

greptile-apps · 2026-02-12T13:12:44Z

+                    let peer_info = entry.get_mut();
+                    let had_addresses = !peer_info.addresses.is_empty();
+                    peer_info.last_seen = Instant::now();
+                    peer_info.is_connected = true;
+                    if !addresses.is_empty() {
+                        peer_info.addresses = addresses.clone();
+                    }
+                    !addresses.is_empty() && !had_addresses


Logic issue with routing promotion: if a peer connects multiple times or if addresses changes from non-empty to a different non-empty list, had_addresses will be true but the condition !addresses.is_empty() && !had_addresses will be false, preventing re-promotion. However, if addresses change (e.g., peer reconnects from different IP), we'll attempt to add the peer again on line 2292, which may cause duplicate entries or errors.

Consider tracking whether the peer is already in the routing table to avoid duplicate add_node calls.

…hutdown Replace check-then-await pattern with tokio::select! to race the cancellation token against recv(), eliminating a race condition where shutdown could be missed between the is_cancelled() check and blocking on recv(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mickvandijke · 2026-02-12T13:54:48Z

@greptile-apps @claude please re-review

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot · 2026-02-12T13:54:49Z

-    /// so short-lived request clients are not inserted as DHT routing peers.
-    async fn update_peer_info(&self, peer_id: PeerId, message: &DhtNetworkMessage) {
+    /// Update peer information
+    async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {


The parameter _message is prefixed with an underscore but never used in the function body. Since this parameter was previously used to determine routing promotion but is no longer needed after the revert, it should be removed from the function signature entirely.

Suggested change

async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {

async fn update_peer_info(&self, peer_id: PeerId, _: &DhtNetworkMessage) {

Copilot · 2026-02-12T13:54:49Z

+                        avg_latency: Duration::from_millis(50),
+                        reliability_score: 1.0,
+                    });
+                    !addresses.is_empty()


The logic returns true when a new peer is added with addresses, but on line 2247 it returns true only when addresses were empty before and are now non-empty (!addresses.is_empty() && !had_addresses). This inconsistency means new peers will always be promoted even if they shouldn't be, while existing peers will only be promoted when transitioning from no addresses to having addresses. The logic should be consistent: either both cases should check !had_addresses or both should simply check !addresses.is_empty().

greptile-apps · 2026-02-12T13:58:48Z

Greptile Overview

Greptile Summary

Replaced scattered AtomicBool shutdown flags with CancellationToken across P2PNode, DhtNetworkManager, TransportHandle, and background tasks for cooperative shutdown coordination. Reverted participation-only routing table promotion (commit ff30dab) — peers are now promoted to the DHT routing table on any connection event, not just DHT participation operations like Put/Join.

Key improvements:

All background tasks now use tokio::select! to respond immediately to cancellation instead of polling shutdown flags
Eliminates "channel closed" errors during teardown by ensuring proper task coordination
Simplified is_running() from async to sync method since it only checks token state
Reduced code duplication with helper methods for task cleanup in transport_handle.rs:1033-1057

DHT routing revert:

Removed in_routing_table field from DhtPeerInfo
Removed should_promote_to_routing() message filtering
Peers now added to routing table in handle_peer_connected() on first connection with valid addresses

Confidence Score: 4/5

Safe to merge with minor concern about duplicate routing table entries
Shutdown refactor is well-implemented and follows best practices. The DHT routing table revert restores proper peer discovery, but the logic in handle_peer_connected() at src/dht_network_manager.rs:2247 may allow duplicate add_node calls if peer addresses change during reconnection (see existing thread comment)
src/dht_network_manager.rs - review the routing table promotion logic around line 2247

Important Files Changed

Filename	Overview
src/network.rs	Replaced `RwLock<bool>` with CancellationToken for shutdown signaling; simplified `is_running()` to non-async; refactored run loop to use `tokio::select!` for clean cancellation
src/dht_network_manager.rs	Replaced AtomicBool with CancellationToken; reverted participation-only routing (peers now promoted on any connection); removed `in_routing_table` field from DhtPeerInfo; refactored event handler to use `tokio::select!`
src/transport_handle.rs	Replaced AtomicBool with CancellationToken for shutdown coordination; refactored all background tasks (keepalive, periodic maintenance, connection monitor) to use `tokio::select!`; added helper methods to reduce code duplication in task cleanup
src/transport/ant_quic_adapter.rs	Updated `spawn_recv_task` to accept CancellationToken instead of AtomicBool; implemented proper cancellation with `tokio::select!` to handle shutdown immediately during blocking recv()

Sequence Diagram

sequenceDiagram
    participant User
    participant P2PNode
    participant DhtNetworkManager
    participant TransportHandle
    participant AntQuicAdapter
    participant BackgroundTasks

    User->>P2PNode: stop()
    activate P2PNode
    P2PNode->>P2PNode: shutdown.cancel()
    Note over P2PNode: CancellationToken signals shutdown
    
    P2PNode->>DhtNetworkManager: stop()
    activate DhtNetworkManager
    DhtNetworkManager->>DhtNetworkManager: shutdown.cancel()
    DhtNetworkManager->>BackgroundTasks: select! cancelled()
    BackgroundTasks-->>DhtNetworkManager: tasks exit cleanly
    DhtNetworkManager-->>P2PNode: Ok
    deactivate DhtNetworkManager
    
    P2PNode->>TransportHandle: stop()
    activate TransportHandle
    TransportHandle->>TransportHandle: shutdown.cancel()
    TransportHandle->>BackgroundTasks: select! cancelled()
    BackgroundTasks-->>TransportHandle: tasks exit cleanly
    TransportHandle->>AntQuicAdapter: shutdown_endpoints()
    activate AntQuicAdapter
    AntQuicAdapter->>BackgroundTasks: recv tasks via select!
    BackgroundTasks-->>AntQuicAdapter: exit immediately
    AntQuicAdapter-->>TransportHandle: Ok
    deactivate AntQuicAdapter
    TransportHandle-->>P2PNode: Ok
    deactivate TransportHandle
    
    P2PNode-->>User: Ok
    deactivate P2PNode
    Note over User,BackgroundTasks: No "channel closed" errors

claude · 2026-02-12T13:59:59Z

Claude finished @mickvandijke's task in 2m 30s —— View job

Re-review of PR #26

Tasks:

Read all changed files
Analyze CancellationToken shutdown implementation
Check if previous review issues were addressed
Review DHT routing table promotion logic
Provide comprehensive feedback

Summary

This PR makes good progress on improving shutdown semantics with the CancellationToken refactoring, but the DHT routing table promotion logic has issues that need to be addressed before merging.

✅ What's Working Well

1. CancellationToken Migration

The refactoring from AtomicBool to CancellationToken is well-executed:

src/transport/ant_quic_adapter.rs:273-299: ✅ FIXED - Now properly uses tokio::select! to race shutdown against recv(), eliminating the race condition identified in previous reviews
src/network.rs: ✅ Clean implementation with tokio::select! in run loop
src/dht/core_engine.rs: ✅ Proper tokio::select! in maintenance tasks
src/transport_handle.rs: ✅ Good helper methods for joining tasks
src/dht_network_manager.rs: ✅ Event handler properly uses tokio::select!

2. Code Quality

No uses of .unwrap(), .expect(), or panic!() ✅
Proper error handling with Result types ✅
Good use of tracing instead of println! ✅
Consistent shutdown patterns across all background tasks ✅

🚨 Issues Found

Issue #1: Inconsistent Routing Table Promotion Logic ⚠️

Location: src/dht_network_manager.rs:2236-2262

The logic for deciding when to add peers to the routing table is inconsistent:

For existing peers (line 2247):

!addresses.is_empty() && !had_addresses  // Only on address transition

For new peers (line 2259):

!addresses.is_empty()  // Always if addresses present

Problems:

New peers: Added to routing table immediately if they have addresses ✅
Existing peers: Only added when transitioning from "no addresses" → "has addresses" ❌

Scenario that fails:

1. Peer connects, DhtPeerInfo created with addresses from transport
2. handle_peer_connected() called
3. Peer already has addresses, so had_addresses = true
4. Condition fails: !addresses.is_empty() && !had_addresses = true && false = false
5. Peer is NEVER added to routing table despite being connected!

This contradicts the PR's goal: "peers are once again promoted on any valid interaction"

Recommendation: Make routing promotion consistent. Either:

Option A - Always promote connected peers with addresses:

// For both new and existing peers
if !addresses.is_empty() {
    should_add_to_routing = true;
}

Option B - Track routing table membership explicitly:

// Add to DhtPeerInfo
in_routing_table: bool

// Then check:
!addresses.is_empty() && !peer_info.in_routing_table

Issue #2: Unused Parameter ⚠️

Location: src/dht_network_manager.rs:2170

async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {

The _message parameter is unused after removing the participation-only promotion logic.

Recommendation: Remove the parameter and update all call sites:

async fn update_peer_info(&self, peer_id: PeerId) {

Issue #3: Missing Test Coverage ⚠️

Location: The PR removes old routing promotion tests but doesn't add new ones

Per review comment at src/dht_network_manager.rs:2688, the PR deleted unit tests for the participation-only logic but didn't add tests for the reverted behavior.

Needed tests:

#[tokio::test]
async fn test_peer_promoted_on_connection() {
    // Verify peers are added to routing table on PeerConnected
}

#[tokio::test]
async fn test_peer_with_initial_addresses_promoted() {
    // Verify peers that have addresses from the start are still promoted
}

#[tokio::test]
async fn test_peer_discovered_event_emitted() {
    // Verify PeerDiscovered events work correctly
}

📋 Detailed File Review

src/transport/ant_quic_adapter.rs

Lines 273-299: ✅ spawn_recv_task now properly uses tokio::select! - race condition FIXED
Overall: Clean implementation, no issues

src/network.rs

Line 1311: ✅ is_running() correctly uses !shutdown.is_cancelled()
Overall: Good refactoring, shutdown semantics are clear

src/dht_network_manager.rs

Line 2170: ⚠️ Unused _message parameter should be removed
Lines 2236-2262: ⚠️ Routing promotion logic is inconsistent (see Issue style: Fix formatting issues to resolve CI failures #1)
Line 2247: ❌ Existing peers with addresses won't be promoted
Line 2688: ⚠️ Missing test coverage for new behavior
Overall: Shutdown refactoring is good, but routing logic needs fixes

src/transport_handle.rs

Overall: ✅ Excellent refactoring with good helper methods for task management

src/dht/core_engine.rs

Overall: ✅ Clean use of CancellationToken in maintenance tasks

🎯 Recommendations Before Merge

Must Fix:

Fix routing promotion logic to match stated goals - peers should be promoted on connection, not just address transitions
Add test coverage for the reverted routing promotion behavior

Should Fix:

Remove unused _message parameter from update_peer_info()

💡 Suggested Fix for Issue #1

Replace lines 2236-2262 with:

// Track peer and update addresses
{
    let mut peers = self.dht_peers.write().await;
    match peers.entry(peer_id.clone()) {
        std::collections::hash_map::Entry::Occupied(mut entry) => {
            let peer_info = entry.get_mut();
            peer_info.last_seen = Instant::now();
            peer_info.is_connected = true;
            if !addresses.is_empty() {
                peer_info.addresses = addresses.clone();
            }
        }
        std::collections::hash_map::Entry::Vacant(entry) => {
            entry.insert(DhtPeerInfo {
                peer_id: peer_id.clone(),
                dht_key,
                addresses: addresses.clone(),
                last_seen: Instant::now(),
                is_connected: true,
                avg_latency: Duration::from_millis(50),
                reliability_score: 1.0,
            });
        }
    }
}

// Skip peers with no addresses - they cannot be used for DHT routing.
let address_str = match addresses.first() {
    Some(addr) => addr.to_string(),
    None => {
        warn!("Peer {} has no addresses, skipping DHT routing table addition", peer_id);
        return;
    }
};

// Always try to add connected peers to routing table (idempotent operation)
{
    use crate::dht::core_engine::{NodeCapacity, NodeId, NodeInfo};

    let node_info = NodeInfo {
        id: NodeId::from_bytes(dht_key),
        address: address_str,
        last_seen: SystemTime::now(),
        capacity: NodeCapacity::default(),
    };

    match self.dht.write().await.add_node(node_info).await {
        Ok(_) => info!("Added/updated peer {} in DHT routing table", peer_id),
        Err(e) => warn!("Failed to add peer {} to DHT routing table: {}", peer_id, e),
    }
}

This ensures all connected peers with addresses are promoted to the routing table, matching the PR's stated goal.

📊 Overall Assessment

Strengths:

✅ Shutdown refactoring is well-done and consistent
✅ Code quality is high
✅ Race condition in spawn_recv_task is fixed

Weaknesses:

❌ Routing table promotion logic doesn't match stated goals
❌ Missing test coverage for reverted behavior
⚠️ Minor cleanup needed (unused parameter)

Verdict: Request changes - The shutdown refactoring is excellent, but the DHT routing promotion logic needs to be fixed to properly promote all connected peers, not just those transitioning from no-addresses to has-addresses.

| Branch: fix/p2p-shutdown-and-dht-routing

…rt() is_running() only checked !shutdown.is_cancelled(), which is true for a freshly constructed CancellationToken. Add an is_started AtomicBool that is set on start() and cleared on stop() so the method correctly reports false before the node is started. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Copilot · 2026-02-12T14:33:46Z

    pub async fn stop(&self) -> Result<()> {
        info!("Stopping P2P node...");

-        // Set running state to false
-        *self.running.write().await = false;
+        // Signal the run loop to exit
+        self.shutdown.cancel();

        // Stop DHT manager first so leave messages can be sent while transport is still active.
        self.dht_manager.stop().await?;


CancellationToken cannot be “uncancelled”, but P2PNode stores a single token created in new() and cancels it in stop(). After the first stop, is_running() will always return false and run() will immediately exit even if start() is called again (since start() does not create a fresh token). If restart is intended, store the token behind interior mutability (e.g., RwLock<CancellationToken> / Mutex<CancellationToken> / ArcSwap) and replace it with a new token in start() (or create a per-run child token). If restart is not intended, consider making that explicit by returning an error on subsequent start() calls (or redesigning the API so stop() consumes the instance).

Copilot · 2026-02-12T14:33:47Z

+    pub fn is_running(&self) -> bool {
+        self.is_started.load(std::sync::atomic::Ordering::Acquire) && !self.shutdown.is_cancelled()
    }


CancellationToken cannot be “uncancelled”, but P2PNode stores a single token created in new() and cancels it in stop(). After the first stop, is_running() will always return false and run() will immediately exit even if start() is called again (since start() does not create a fresh token). If restart is intended, store the token behind interior mutability (e.g., RwLock<CancellationToken> / Mutex<CancellationToken> / ArcSwap) and replace it with a new token in start() (or create a per-run child token). If restart is not intended, consider making that explicit by returning an error on subsequent start() calls (or redesigning the API so stop() consumes the instance).

Copilot · 2026-02-12T14:33:47Z

+        // Track peer and decide whether it should be promoted to routing table.
+        let should_add_to_routing = {
+            let mut peers = self.dht_peers.write().await;
+            match peers.entry(peer_id.clone()) {
+                std::collections::hash_map::Entry::Occupied(mut entry) => {
+                    let peer_info = entry.get_mut();
+                    let had_addresses = !peer_info.addresses.is_empty();
+                    peer_info.last_seen = Instant::now();
+                    peer_info.is_connected = true;
+                    if !addresses.is_empty() {
+                        peer_info.addresses = addresses.clone();
+                    }
+                    !addresses.is_empty() && !had_addresses
+                }
+                std::collections::hash_map::Entry::Vacant(entry) => {
+                    entry.insert(DhtPeerInfo {
+                        peer_id: peer_id.clone(),
+                        dht_key,
+                        addresses: addresses.clone(),
+                        last_seen: Instant::now(),
+                        is_connected: true,
+                        avg_latency: Duration::from_millis(50),
+                        reliability_score: 1.0,
+                    });
+                    !addresses.is_empty()
                }
            }
-            std::collections::hash_map::Entry::Vacant(entry) => {
-                entry.insert(DhtPeerInfo {
-                    peer_id: peer_id.clone(),
-                    dht_key,
-                    addresses,
-                    last_seen: Instant::now(),
-                    is_connected: true,
-                    avg_latency: Duration::from_millis(50),
-                    reliability_score: 1.0,
-                    in_routing_table: false,
-                });
+        };
+
+        // Skip peers with no addresses - they cannot be used for DHT routing.
+        let address_str = match addresses.first() {
+            Some(addr) => addr.to_string(),
+            None => {
+                warn!(
+                    "Peer {} has no addresses, skipping DHT routing table addition",
+                    peer_id
+                );
+                return;
            }
        };

-        debug!(
-            "Peer {} tracked for DHT liveness; routing promotion deferred until non-lookup DHT participation",
-            peer_id
-        );
-    }
+        if !should_add_to_routing {
+            debug!("Peer {} already tracked in DHT routing state", peer_id);
+            return;
+        }


This promotion logic can permanently skip adding a peer to the DHT routing table in common orderings. Example: update_peer_info() may insert/update a peer with non-empty peer_info.addresses before handle_peer_connected() runs; then had_addresses becomes true and should_add_to_routing becomes false, and the peer is never added (since update_peer_info() no longer performs the add_node call). Additionally, the code warns + returns on missing addresses.first() before checking should_add_to_routing, which can create noisy warnings and early returns even when no routing-table add is intended. Consider gating the address requirement and warning behind should_add_to_routing, and determine “already in routing table” via an explicit flag (reintroduce in_routing_table) or by making add_node idempotent (attempt to add whenever an address is available and treat “already exists” as success).

Copilot · 2026-02-12T14:33:47Z

+        // Add to DHT routing table.
+        {
+            use crate::dht::core_engine::{NodeCapacity, NodeId, NodeInfo};
+
+            let node_info = NodeInfo {
+                id: NodeId::from_bytes(dht_key),
+                address: address_str,
+                last_seen: SystemTime::now(),
+                capacity: NodeCapacity::default(),
+            };
+
+            if let Err(e) = self.dht.write().await.add_node(node_info).await {
+                warn!("Failed to add peer {} to DHT routing table: {}", peer_id, e);
+            } else {
+                info!("Added peer {} to DHT routing table", peer_id);
+            }
+        }

-    /// Whether an inbound DHT message should promote the sender to routing.
-    fn should_promote_to_routing(message: &DhtNetworkMessage) -> bool {
-        matches!(&message.message_type, DhtMessageType::Request)
-            && matches!(
-                &message.payload,
-                DhtNetworkOperation::Put { .. } | DhtNetworkOperation::Join
-            )
+        if self.event_tx.receiver_count() > 0 {
+            let _ = self
+                .event_tx
+                .send(DhtNetworkEvent::PeerDiscovered { peer_id, dht_key });
+        }


The routing promotion behavior was substantially changed and the prior unit tests around routing promotion were removed. Add tests that cover the new expected behavior (e.g., a peer gets added to the routing table and PeerDiscovered fires exactly once), including the important ordering where peer metadata is learned/updated before the PeerConnected event (and vice versa), so routing table population stays reliable.

dirvine

Let's go !!!

mickvandijke added 2 commits February 12, 2026 13:32

fix(dht): revert participation-only routing table promotion

ff30dab

fix(p2p): fix graceful shutdown for P2PNode

08ccb3f

Copilot AI review requested due to automatic review settings February 12, 2026 13:09

Copilot AI reviewed Feb 12, 2026

View reviewed changes

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

mickvandijke and others added 2 commits February 12, 2026 14:36

style: fix import ordering in network.rs and transport_handle.rs

5587f23

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 12, 2026 13:54

Copilot AI reviewed Feb 12, 2026

View reviewed changes

mickvandijke and others added 2 commits February 12, 2026 15:11

style: fix rustfmt formatting in is_running()

c6d2c5a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 12, 2026 14:31

Copilot AI reviewed Feb 12, 2026

View reviewed changes

fix(transport): use CancellationToken for adapter shutdown

46497c1

dirvine approved these changes Feb 12, 2026

View reviewed changes

mickvandijke merged commit 51a2c95 into main Feb 12, 2026
19 checks passed

mickvandijke deleted the fix/p2p-shutdown-and-dht-routing branch February 12, 2026 16:53

@@ @@ -2713,69 +2686,3 @@ impl Default for DhtNetworkConfig { @@
                       }
                   }
               }

-            loop {
-                if shutdown.load(Ordering::Relaxed) {
-                if shutdown.is_cancelled() {
-                    break;
-                }
+            loop {
+                tokio::select! {
+                    () = shutdown.cancelled() => {
+                        break;
+                    }
+                    result = transport.endpoint().recv() => {
+                        match result {

	async fn update_peer_info(&self, peer_id: PeerId, _message: &DhtNetworkMessage) {
	async fn update_peer_info(&self, peer_id: PeerId, _: &DhtNetworkMessage) {

Conversation

mickvandijke commented Feb 12, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

mickvandijke commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

claude bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #26

Summary

✅ What's Working Well

1. CancellationToken Migration

2. Code Quality

🚨 Issues Found

Issue #1: Inconsistent Routing Table Promotion Logic ⚠️

Issue #2: Unused Parameter ⚠️

Issue #3: Missing Test Coverage ⚠️

📋 Detailed File Review

src/transport/ant_quic_adapter.rs

src/network.rs

src/dht_network_manager.rs

src/transport_handle.rs

src/dht/core_engine.rs

🎯 Recommendations Before Merge

Must Fix:

Should Fix:

claude bot commented Feb 12, 2026 •

edited

Loading