fix: NAT traversal hardening and 1000-node scale fixes#70
Merged
jacderida merged 8 commits intorc-2026.4.1from Apr 6, 2026
Merged
fix: NAT traversal hardening and 1000-node scale fixes#70jacderida merged 8 commits intorc-2026.4.1from
jacderida merged 8 commits intorc-2026.4.1from
Conversation
mickvandijke
added a commit
that referenced
this pull request
Apr 6, 2026
`send_dht_request` computes its identity-exchange timeout as `min(request_timeout, IDENTITY_EXCHANGE_TIMEOUT)`. With the constant at 5s the effective budget was always 5s — too tight for cross-region or congested cellular links where the two-RTT exchange plus ML-DSA-65 verification can blow past with retransmits. This is the same root cause that motivated bumping `BOOTSTRAP_IDENTITY_TIMEOUT_SECS` to 15s in 08febcc; that fix only covered the initial bootstrap dial, leaving every subsequent peer dial exposed to the original failure mode. Bumping this constant to 15s matches the bootstrap budget so both code paths now absorb the same class of slow-link delay. Resolves the Greptile P1 finding on PR #70. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mickvandijke
added a commit
that referenced
this pull request
Apr 6, 2026
The earlier wildcard substitution in `local_dht_node()` opened a UDP
socket and used the connect-then-getsockname trick to discover the
host's primary outbound interface IP, then published that as a fallback
when the listen address was a wildcard. This had three problems:
1. Blocking I/O inside an async fn: `std::net::UdpSocket::{bind,
connect,local_addr}` are sync syscalls held across no `.await`, and
`local_dht_node()` is on the hot DHT-lookup path. Wrong colour for
tokio worker threads.
2. Wrong answer for home-NAT deployments: a node behind a residential
NAT publishes an RFC1918 address (192.168.x.x) that internet peers
cannot route to. The new `dial_addresses()` helper then burned
connect attempts on guaranteed-failed dials.
3. Unnecessary: every node already has an authoritative source for its
post-NAT reflexive address — the OBSERVED_ADDRESS frames sent by
peers, plumbed through `TransportHandle::observed_external_address()`.
The previous fix made it the *first* source but kept the substitution
as a fallback "just in case." It is not needed.
This commit removes `primary_local_ip()`, `PRIMARY_IP_PROBE_V{4,6}`,
and the `UdpSocket` import. `local_dht_node()` now sources addresses
from exactly two places:
- `transport.observed_external_address()` — the OBSERVED_ADDRESS
reflexive address.
- `transport.listen_addrs()`, but only entries with a *specific* IP
(loopback or any other non-wildcard); wildcards and zero ports are
skipped.
If neither source produces an address, the published self-entry has
empty `addresses`. This is the right answer at the publish layer:
better to admit "I don't know how to be reached yet" than to lie with
a bind-side wildcard or a guessed LAN IP. The empty window closes
naturally once the first peer connects to us and OBSERVED_ADDRESS flows.
### Tests
`tests/dht_self_advertisement.rs` rewritten as five tests across two
paths:
- **Loopback bind path** (single-node, `local: true`):
- `local_mode_publishes_dialable_loopback_address`
- `local_mode_published_self_entry_matches_runtime_listen_addrs`
- `local_mode_never_publishes_port_zero`
- **Wildcard / OBSERVED_ADDRESS path** (`local: false`):
- `wildcard_bind_with_no_peers_publishes_empty_self_entry` — pins the
new "don't lie when you don't know" contract.
- `wildcard_bind_publishes_observed_address_after_peer_connection` —
primary regression test for the OBSERVED_ADDRESS flow. Two
wildcard-bound nodes connect over loopback, identity exchanges,
and the test polls `transport.observed_external_address()` until
the frame arrives, then asserts the observed value appears in the
published self-entry.
Resolves the Greptile P2 finding on PR #70.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mickvandijke
added a commit
that referenced
this pull request
Apr 6, 2026
Bundles seven review-identified fixes (H2, H3, H4, M1, M2, M3, M5) on
the NAT-traversal hardening and 1000-node scale work. Each addresses a
specific bug or correctness gap surfaced during deep review.
H2 — dispatcher fault isolation (src/transport_handle.rs):
A single shard-consumer panic caused the inbound dispatcher loop to
break, dropping the remaining seven shard senders and cascading down
all healthy shards. Switched to `try_send` with explicit handling of
both `Full` and `Closed` errors: drops are counted and warn-logged at
intervals, and the dispatcher only exits when its upstream channel
itself closes. The eight shards now provide real fault isolation.
H3 — bridge task split + bounded forwarder channels
(src/network.rs, src/transport/saorsa_transport_adapter.rs):
The DHT bridge ran `find_closest_nodes_network` inline inside the
select loop on the relay branch, blocking the peer-address-update
branch for the duration of the iterative lookup; the underlying
forwarder mpsc channels were `unbounded_channel` so the queue could
grow without limit during the stall. Detached the lookup + publish
into its own spawned task per relay event so the select loop keeps
polling, and replaced both forwarder channels with bounded
`channel(ADDRESS_EVENT_CHANNEL_CAPACITY)` plus a per-event drop
counter.
H4 — atomic touch fast path respects addr_type
(src/dht/core_engine.rs):
The read-lock fast path in `touch_last_seen_if_merge_noop` only
checked whether the address was already present, ignoring the
caller's `addr_type`. The relay-promotion call site in the DHT bridge
(`touch_node_typed(peer, addr, AddressType::Relay)`) on an address
previously learned as `Direct` would silently bump `last_seen` and
drop the relay re-classification, breaking the type-priority ordering
that the dialer relies on. Plumbed `addr_type` through
`try_touch_last_seen` and `touch_last_seen_if_merge_noop`; the fast
path now refuses to short-circuit when the existing classification
differs, forcing escalation to the slow path so
`merge_typed_address` can re-order.
M1 — observed-address cache keyed by local bind
(src/transport/observed_address_cache.rs,
src/transport/saorsa_transport_adapter.rs,
src/transport_handle.rs,
src/dht_network_manager.rs):
On a multi-homed host, observations from peers reaching the node via
different local interfaces were mixed into one keyspace, so an
address learned on interface A could be served as the self-entry
when only interface B was usable. Cache key changed from `SocketAddr`
to `(local_bind, observed)`. Added `most_frequent_recent_per_local_bind`
plural API. The forwarder captures per-stack `local_bind` before
spawning. New `TransportHandle::observed_external_addresses()` plural
accessor combines all live + cached per-bind observations;
`local_dht_node()` now publishes all of them so peers reaching the
host on any interface can dial back. Adds four new tests covering
multi-bind partitioning.
M2 — try_send for shard handoff (src/transport_handle.rs):
The dispatcher's `send().await` to a per-shard mpsc would block on
the upstream from saorsa-transport when one shard's RwLock contention
spiked, starving the other seven shards through the same single point
of serialisation the sharding was supposed to eliminate. The H2 fix's
non-blocking `try_send` also addresses this — recorded as M2 for
explicitness.
M3 — shard ordering docstring (src/transport_handle.rs):
`shard_index_for_addr` claimed to preserve per-peer ordering across
reconnects. True only across ephemeral-port reconnects from the same
IP. NAT rebinding to a new public IP, mobile Wi-Fi↔cellular roaming,
or dual-stack failover routes the peer to a different shard, breaking
causal ordering at the application layer. Docstring corrected to
"per source IP" with the caveat called out.
M5 — hold lock across register_connection_peer_id
(src/transport_handle.rs):
The shard consumer dropped the `peer_to_channel` write lock before
calling `dual_node.register_connection_peer_id`, leaving a window in
which the app-level map said the peer was known but the transport's
internal addr→peer map did not — a hole-punch lookup during that
window could fail spuriously. Lock now held across the registration
call.
Verification:
- cargo build --lib: clean
- cargo fmt --check: clean
- cargo clippy --lib -- -D warnings -D clippy::unwrap_used \
-D clippy::expect_used: clean
- cargo clippy --all-targets -- -D warnings: clean
- cargo test --lib: 282 passed, 0 failed
- cargo test --test dht_self_advertisement: 6/6 passed
- cargo test --test two_node_messaging: 7/7 passed
- cargo test --test node_lifecycle: 17/17 passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bundles six NAT-traversal fixes uncovered during a deep audit of the
sporadic-failure pattern reported by saorsa-node-lmdb.
Root cause:
DhtNetworkManager::local_dht_node() built its self-entry from the
static NodeConfig::listen_addrs() derivation, which returns wildcard
IPs (0.0.0.0 / ::) and the *configured* port (0 for ephemeral binds).
Receivers' dialable_addresses() filter then rejected the entry, so
DHT-based peer discovery returned zero contactable addresses for any
node. Connections only succeeded via side channels (static bootstrap,
in-memory cache, relay fallback) — exactly the sporadic pattern.
Rewritten as `async fn` that sources addresses, in priority order,
from TransportHandle::observed_external_address() (the OBSERVED_ADDRESS
reflexive address, which was defined on the adapter and never called
outside its own file) and the runtime listen_addrs RwLock, with
unspecified IPs substituted via a new primary_local_ip() helper that
uses the standard UdpSocket::connect-then-getsockname trick. No new
dependencies.
Other fixes bundled:
- bootstrap_from_peers now records the bootstrap peer as the natural
hole-punch coordinator referrer for every node it returned, threading
through the existing dial_candidate -> set_hole_punch_preferred_coordinator
plumbing that the iterative-lookup path already used.
- Relay-established and peer-address-update events drained via a
tokio::select! over new TransportHandle::recv_relay_established() /
recv_peer_address_update() instead of the previous 1-second polling
loop, eliminating the race window where outbound DHT queries
advertised no relay address for up to a second after relay setup.
- BOOTSTRAP_IDENTITY_TIMEOUT_SECS bumped 5 -> 15 with explanatory
comment. Identity exchange (two RTTs + ML-DSA-65 signature, ~3.3kB)
blew through 5s sporadically on congested cellular and cross-region
links.
- Added dial_addresses() helper that loops through every dialable
address until one succeeds. Replaced 5 single-address dial sites
(bootstrap_from_peers, spawn_bucket_refresh_task, trigger_self_lookup,
find_closest_nodes_network parallel-query block, send_dht_request
fallback) so a stale NAT binding on entry #1 no longer kills the
dial when entry #2 would have worked.
Regression tests:
tests/dht_self_advertisement.rs adds three #[tokio::test] cases that
pin the published self-entry behaviour. All three FAILED on main with
concrete evidence (e.g. `[Quic(0.0.0.0:0)]` published as the only
address; transport bound to port 55886 but DHT advertised port 0).
All PASS after this commit. They use only public APIs
(DhtNetworkManager::find_closest_nodes_local_with_self) so no new
test surface area was added to the library.
Verified:
cargo test --lib 269/269
cargo test --test dht_self_advertisement 3/3
cargo test --test two_node_messaging 7/7
cargo test --test node_lifecycle 17/17
cargo clippy --lib -- -D warnings clean
The dead hole-punch chain in saorsa-transport's nat_traversal_api.rs
(Bug 4 from the audit) is fixed in a separate commit in that crate.
The --port 0 default in ant-node (Bug 5) is left as a deliberate
decision for the operator config layer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both fields of `IPDiversityConfig` are explicitly set, so the trailing `..IPDiversityConfig::default()` triggers `clippy::needless_update` under `cargo clippy --all-targets`. Test code only; no functional change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`send_dht_request` computes its identity-exchange timeout as `min(request_timeout, IDENTITY_EXCHANGE_TIMEOUT)`. With the constant at 5s the effective budget was always 5s — too tight for cross-region or congested cellular links where the two-RTT exchange plus ML-DSA-65 verification can blow past with retransmits. This is the same root cause that motivated bumping `BOOTSTRAP_IDENTITY_TIMEOUT_SECS` to 15s in 08febcc; that fix only covered the initial bootstrap dial, leaving every subsequent peer dial exposed to the original failure mode. Bumping this constant to 15s matches the bootstrap budget so both code paths now absorb the same class of slow-link delay. Resolves the Greptile P1 finding on PR #70. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The earlier wildcard substitution in `local_dht_node()` opened a UDP
socket and used the connect-then-getsockname trick to discover the
host's primary outbound interface IP, then published that as a fallback
when the listen address was a wildcard. This had three problems:
1. Blocking I/O inside an async fn: `std::net::UdpSocket::{bind,
connect,local_addr}` are sync syscalls held across no `.await`, and
`local_dht_node()` is on the hot DHT-lookup path. Wrong colour for
tokio worker threads.
2. Wrong answer for home-NAT deployments: a node behind a residential
NAT publishes an RFC1918 address (192.168.x.x) that internet peers
cannot route to. The new `dial_addresses()` helper then burned
connect attempts on guaranteed-failed dials.
3. Unnecessary: every node already has an authoritative source for its
post-NAT reflexive address — the OBSERVED_ADDRESS frames sent by
peers, plumbed through `TransportHandle::observed_external_address()`.
The previous fix made it the *first* source but kept the substitution
as a fallback "just in case." It is not needed.
This commit removes `primary_local_ip()`, `PRIMARY_IP_PROBE_V{4,6}`,
and the `UdpSocket` import. `local_dht_node()` now sources addresses
from exactly two places:
- `transport.observed_external_address()` — the OBSERVED_ADDRESS
reflexive address.
- `transport.listen_addrs()`, but only entries with a *specific* IP
(loopback or any other non-wildcard); wildcards and zero ports are
skipped.
If neither source produces an address, the published self-entry has
empty `addresses`. This is the right answer at the publish layer:
better to admit "I don't know how to be reached yet" than to lie with
a bind-side wildcard or a guessed LAN IP. The empty window closes
naturally once the first peer connects to us and OBSERVED_ADDRESS flows.
### Tests
`tests/dht_self_advertisement.rs` rewritten as five tests across two
paths:
- **Loopback bind path** (single-node, `local: true`):
- `local_mode_publishes_dialable_loopback_address`
- `local_mode_published_self_entry_matches_runtime_listen_addrs`
- `local_mode_never_publishes_port_zero`
- **Wildcard / OBSERVED_ADDRESS path** (`local: false`):
- `wildcard_bind_with_no_peers_publishes_empty_self_entry` — pins the
new "don't lie when you don't know" contract.
- `wildcard_bind_publishes_observed_address_after_peer_connection` —
primary regression test for the OBSERVED_ADDRESS flow. Two
wildcard-bound nodes connect over loopback, identity exchanges,
and the test polls `transport.observed_external_address()` until
the frame arrives, then asserts the observed value appears in the
published self-entry.
Resolves the Greptile P2 finding on PR #70.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ctions
saorsa-transport exposes the externally-observed address (from QUIC
OBSERVED_ADDRESS frames) only via *active* connections. When every
connection drops, the live read returns None and the node has no way to
tell the DHT how to be reached. The result is a temporary "invisible
window" where the node vanishes from DHT queries between a connection
drop and the next reconnection.
This commit adds an in-memory cache that records every
`P2pEvent::ExternalAddressDiscovered` event the transport emits and
serves as a frequency- and recency-aware fallback when no live
connection has an observation.
### Design
`TransportHandle::observed_external_address()` now has two sources:
1. Live: ask `dual_node.get_observed_external_address()` first. Unchanged;
this is the freshest answer while at least one connection is up.
2. Fallback: consult the new `ObservedAddressCache` when the live
source is empty. The cache returns the most-frequently-observed
address among recent entries, breaking ties by recency.
Different peers can legitimately observe a node at different addresses
(symmetric NAT, multi-homed hosts, dual-stack divergence). The cache
tracks how many distinct `(peer, address)` events have been received
and picks "the address most peers agree on" — saorsa-transport dedups
events per (peer, address) pair, so the count is effectively a peer
consensus signal, not raw event volume.
### NAT-rebinding handling
Pure frequency would let a long-lived stale address (count: 10000,
last seen 24h ago) win over a fresh new address (count: 5, last seen
now). That is the wrong answer after a NAT mapping rebinds.
Selection is therefore split into two passes:
1. Among entries seen within `OBSERVATION_RECENCY_WINDOW` (10 min),
return the highest-count one.
2. If nothing is recent, fall back to the global highest-count entry.
Eviction is recency-based: when the cache is full
(`MAX_CACHED_OBSERVATIONS = 16`), the entry with the oldest `last_seen`
is removed. Stale high-count entries naturally get pushed out as fresh
observations arrive at new addresses.
### Wiring
`DualStackNode::spawn_peer_address_update_forwarder` now takes an
`Arc<parking_lot::Mutex<ObservedAddressCache>>` and handles a third
event variant, `P2pEvent::ExternalAddressDiscovered`, recording each
event into the cache. The other two branches (PeerAddressUpdated,
RelayEstablished) are unchanged. The silent `Ok(_other)` arm is now
explicit with a comment listing the event variants we intentionally
skip.
`TransportHandle::new` creates the cache and passes a clone to the
forwarder; `new_for_tests` creates an empty cache so test handles have
the same semantics. Both constructors are in lockstep.
### Persistence
The cache is in-memory only. A process restart resets it. A freshly
started node re-discovers its current address from live connections
rather than trusting potentially-stale state from a previous run.
### Tests
- 9 unit tests on `ObservedAddressCache` covering: empty, single,
repeated, frequency, recency tiebreak, NAT-rebinding fallback,
long-quiet fallback, eviction, and re-observation-on-full.
- 1 integration test `observed_address_cache_serves_fallback_after_connection_drop`
that connects two wildcard-bound nodes over loopback, waits for
`ExternalAddressDiscovered` to reach the cache via the new
`cached_observed_external_address` polling accessor, stops node_b to
drop the live connection, waits for `connected_peers()` to drain,
then asserts that both `observed_external_address()` and the DHT
self-entry still return the cached value.
Total: 278 lib tests (269 + 9), 6 dht_self_advertisement tests (5 + 1).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
At 1000 nodes the routing table touch was called on every inbound DHT message and always took a write lock, serialising every read behind each update. Wrap `NodeInfo.last_seen` in an `AtomicInstant` and add a fast path that atomically bumps the timestamp under a read lock when the address merge would be a no-op (address absent, already present, or skipped by the loopback-injection rule). The write-lock slow path is still used when a real address merge is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
At 1000 peers the single consumer task draining every inbound message became the dominant serialisation point — each message took three async write locks plus an awaited peer-id registration before the next could even be looked at, causing response delivery to miss the 25 s caller timeout. Split the receive loop into a light dispatcher that hashes the source IP and hands off to one of 8 parallel shard consumers, each running the full parse/register/broadcast pipeline independently. Hashing by `IpAddr` (not full `SocketAddr`) keeps a peer pinned to the same shard across ephemeral-port reconnects so per-peer message ordering is preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bundles seven review-identified fixes (H2, H3, H4, M1, M2, M3, M5) on
the NAT-traversal hardening and 1000-node scale work. Each addresses a
specific bug or correctness gap surfaced during deep review.
H2 — dispatcher fault isolation (src/transport_handle.rs):
A single shard-consumer panic caused the inbound dispatcher loop to
break, dropping the remaining seven shard senders and cascading down
all healthy shards. Switched to `try_send` with explicit handling of
both `Full` and `Closed` errors: drops are counted and warn-logged at
intervals, and the dispatcher only exits when its upstream channel
itself closes. The eight shards now provide real fault isolation.
H3 — bridge task split + bounded forwarder channels
(src/network.rs, src/transport/saorsa_transport_adapter.rs):
The DHT bridge ran `find_closest_nodes_network` inline inside the
select loop on the relay branch, blocking the peer-address-update
branch for the duration of the iterative lookup; the underlying
forwarder mpsc channels were `unbounded_channel` so the queue could
grow without limit during the stall. Detached the lookup + publish
into its own spawned task per relay event so the select loop keeps
polling, and replaced both forwarder channels with bounded
`channel(ADDRESS_EVENT_CHANNEL_CAPACITY)` plus a per-event drop
counter.
H4 — atomic touch fast path respects addr_type
(src/dht/core_engine.rs):
The read-lock fast path in `touch_last_seen_if_merge_noop` only
checked whether the address was already present, ignoring the
caller's `addr_type`. The relay-promotion call site in the DHT bridge
(`touch_node_typed(peer, addr, AddressType::Relay)`) on an address
previously learned as `Direct` would silently bump `last_seen` and
drop the relay re-classification, breaking the type-priority ordering
that the dialer relies on. Plumbed `addr_type` through
`try_touch_last_seen` and `touch_last_seen_if_merge_noop`; the fast
path now refuses to short-circuit when the existing classification
differs, forcing escalation to the slow path so
`merge_typed_address` can re-order.
M1 — observed-address cache keyed by local bind
(src/transport/observed_address_cache.rs,
src/transport/saorsa_transport_adapter.rs,
src/transport_handle.rs,
src/dht_network_manager.rs):
On a multi-homed host, observations from peers reaching the node via
different local interfaces were mixed into one keyspace, so an
address learned on interface A could be served as the self-entry
when only interface B was usable. Cache key changed from `SocketAddr`
to `(local_bind, observed)`. Added `most_frequent_recent_per_local_bind`
plural API. The forwarder captures per-stack `local_bind` before
spawning. New `TransportHandle::observed_external_addresses()` plural
accessor combines all live + cached per-bind observations;
`local_dht_node()` now publishes all of them so peers reaching the
host on any interface can dial back. Adds four new tests covering
multi-bind partitioning.
M2 — try_send for shard handoff (src/transport_handle.rs):
The dispatcher's `send().await` to a per-shard mpsc would block on
the upstream from saorsa-transport when one shard's RwLock contention
spiked, starving the other seven shards through the same single point
of serialisation the sharding was supposed to eliminate. The H2 fix's
non-blocking `try_send` also addresses this — recorded as M2 for
explicitness.
M3 — shard ordering docstring (src/transport_handle.rs):
`shard_index_for_addr` claimed to preserve per-peer ordering across
reconnects. True only across ephemeral-port reconnects from the same
IP. NAT rebinding to a new public IP, mobile Wi-Fi↔cellular roaming,
or dual-stack failover routes the peer to a different shard, breaking
causal ordering at the application layer. Docstring corrected to
"per source IP" with the caveat called out.
M5 — hold lock across register_connection_peer_id
(src/transport_handle.rs):
The shard consumer dropped the `peer_to_channel` write lock before
calling `dual_node.register_connection_peer_id`, leaving a window in
which the app-level map said the peer was known but the transport's
internal addr→peer map did not — a hole-punch lookup during that
window could fail spuriously. Lock now held across the registration
call.
Verification:
- cargo build --lib: clean
- cargo fmt --check: clean
- cargo clippy --lib -- -D warnings -D clippy::unwrap_used \
-D clippy::expect_used: clean
- cargo clippy --all-targets -- -D warnings: clean
- cargo test --lib: 282 passed, 0 failed
- cargo test --test dht_self_advertisement: 6/6 passed
- cargo test --test two_node_messaging: 7/7 passed
- cargo test --test node_lifecycle: 17/17 passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fb02b49 to
3b7131d
Compare
mickvandijke
added a commit
that referenced
this pull request
Apr 6, 2026
At 1000 nodes two hot paths dominated serialisation in the stack, both showing up as response-delivery timeouts under load. This commit combines those two fixes and the deep-review follow-ups that hardened them against cascade failure and regression. ## perf(dht): atomic last_seen under a read lock The routing-table touch was called on every inbound DHT message and always took a write lock, serialising every read behind each update. Wrap `NodeInfo.last_seen` in an `AtomicInstant` and add a fast path that atomically bumps the timestamp under a read lock when the address merge would be a no-op (address absent, already present with the same type, or skipped by the loopback-injection rule). The write-lock slow path still runs when a real address merge or re-classification is needed. (a9a126f) ### Fast path respects addr_type (H4, from deep review) Initially the read-lock fast path only checked whether the address was already present, ignoring the caller's `addr_type`. The relay-promotion call site in the DHT bridge would silently bump `last_seen` and drop the relay re-classification, breaking the type-priority ordering that the dialer relies on. Plumbed `addr_type` through `try_touch_last_seen` and `touch_last_seen_if_merge_noop`; the fast path now refuses to short-circuit when the existing classification differs, forcing escalation to the slow path so `merge_typed_address` can re-order. ## perf(transport): shard inbound message dispatcher by source IP At 1000 peers the single consumer task draining every inbound message became the dominant serialisation point — each message took three async write locks plus an awaited peer-id registration before the next could even be looked at, causing response delivery to miss the 25 s caller timeout. Split the receive loop into a light dispatcher that hashes the source IP and hands off to one of 8 parallel shard consumers, each running the full parse/register/broadcast pipeline independently. Hashing by `IpAddr` (not full `SocketAddr`) keeps a peer pinned to the same shard across ephemeral-port reconnects so per-peer message ordering is preserved for the common case. (ab3a5bf) ### Dispatcher fault isolation (H2/M2, from deep review) A single shard-consumer panic caused the inbound dispatcher loop to break, dropping the remaining seven shard senders and cascading down all healthy shards. Switched to `try_send` with explicit handling of both `Full` and `Closed` errors: drops are counted and warn-logged at intervals (coalesced via `SHARD_DROP_LOG_INTERVAL`), and the dispatcher only exits when its upstream channel itself closes. The non-blocking `try_send` also addresses M2, where a blocking `send().await` on one congested shard could starve the other seven through the same serialisation point the sharding was meant to eliminate. The eight shards now provide real fault isolation. ### Shard ordering docstring corrected (M3, from deep review) `shard_index_for_addr` previously claimed to preserve per-peer ordering across reconnects. This is true only across ephemeral-port reconnects from the same source IP. NAT rebinding to a new public IP, mobile Wi-Fi↔cellular roaming, or dual-stack failover routes the peer to a different shard, breaking causal ordering at the application layer. Docstring now says "per source IP" with the caveat called out explicitly. ### Hold lock across register_connection_peer_id (M5, from deep review) The shard consumer dropped the `peer_to_channel` write lock before calling `dual_node.register_connection_peer_id`, leaving a window in which the app-level map said the peer was known but the transport's internal addr→peer map did not — a hole-punch lookup during that window could fail spuriously. Lock is now held across the registration call and dropped immediately after. ## chore: drop needless struct update in IPDiversityConfig test Both fields of `IPDiversityConfig` are explicitly set, so the trailing `..IPDiversityConfig::default()` triggers `clippy::needless_update` under `cargo clippy --all-targets`. Test code only; no functional change. (171af29) ## Verification - `cargo build --lib`: clean - `cargo fmt --check`: clean - `cargo clippy --lib -- -D warnings -D clippy::unwrap_used -D clippy::expect_used`: clean - `cargo clippy --all-targets -- -D warnings`: clean - `cargo test --lib`: 269 passed, 0 failed This is one of two PRs that split the original PR #70, which bundled NAT-traversal hardening with these 1000-node scale fixes. The other PR covers the NAT-traversal work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mickvandijke
added a commit
that referenced
this pull request
Apr 6, 2026
Splits the NAT-traversal and dialable-address work from the original PR #70 bundle. Root cause and fix below; other correctness fixes bundled. ## Root cause `DhtNetworkManager::local_dht_node()` built its self-entry from the static `NodeConfig::listen_addrs()` derivation, which returns wildcard IPs (`0.0.0.0` / `::`) and the configured port (`0` for ephemeral binds). Receivers' `dialable_addresses()` filter then rejected the entry, so DHT-based peer discovery returned **zero contactable addresses** for any node. Connections only succeeded via side channels (static bootstrap, in-memory cache, relay fallback) — exactly the sporadic-failure pattern reported by `saorsa-node-lmdb`. ## Fix `local_dht_node()` is now an `async fn` that sources its self-entry addresses from `TransportHandle::observed_external_addresses()` — the post-NAT addresses observed via QUIC `OBSERVED_ADDRESS` frames, one per local interface that has an observation on a multi-homed host. An intermediate iteration added a `UdpSocket::connect`-then- `getsockname` helper (`primary_local_ip`) to substitute wildcard IPs, but it was dropped in a follow-up because it ran blocking syscalls inside an async context and returned RFC1918 addresses behind home NAT; the OBSERVED_ADDRESS-only flow supersedes it and needs no interface probing. If no source produces an address, the published self-entry has an empty `addresses` vec. This is the right answer at the publish layer: better to admit "I don't know how to be reached yet" than to lie with a bind-side wildcard. The empty window closes naturally once the first peer connects and OBSERVED_ADDRESS flows. ## Other NAT fixes bundled - **Event-driven relay/address updates** — relay-established and peer-address-update events drain via `tokio::select!` over `TransportHandle::recv_relay_established()` / `recv_peer_address_update()` instead of the previous 1-second polling loop, eliminating the race window where outbound DHT queries advertised no relay address for up to a second after relay setup. - **Identity exchange timeout bumped to 15 s** — `send_dht_request` computes its identity-exchange timeout as `min(request_timeout, IDENTITY_EXCHANGE_TIMEOUT)`; with the constant at 5 s the effective budget was always 5 s, too tight for cross-region or congested cellular links where the two-RTT exchange plus ML-DSA-65 verification can blow past with retransmits. Bumped to 15 s to match the bootstrap budget. - **Observed-address cache fallback** — saorsa-transport exposes the externally-observed address only via active connections, so a node whose every connection has just dropped has no answer for `observed_external_address()`. This introduces `ObservedAddressCache`: an in-memory, frequency- and recency-aware cache populated by `P2pEvent::ExternalAddressDiscovered` events. When the live source is empty, the cache returns the most-frequently-observed address among recent entries, breaking ties by recency. NAT-rebind handling via a two-pass selection (recent window wins over stale counts). Keyed by `(local_bind, observed)` so multi-homed hosts don't cross-pollute observations from different interfaces. The cache is reset on process restart — a fresh node re-discovers its current address from live connections rather than trusting stale state. - **Bridge task split + bounded forwarder channels** — the DHT bridge previously ran `find_closest_nodes_network` inline inside the select loop on the relay branch, blocking the peer-address-update branch for the duration of the iterative lookup; the underlying forwarder mpsc channels were `unbounded_channel` so the queue could grow without limit during the stall. Detached the lookup + publish into its own spawned task per relay event so the select loop keeps polling, and replaced both forwarder channels with bounded `channel(ADDRESS_EVENT_CHANNEL_CAPACITY)` plus a per-event drop counter. - **`dial_addresses()` helper** — loops through every dialable address until one succeeds. Replaces single-address dial sites in `bootstrap_from_peers`, `spawn_bucket_refresh_task`, `trigger_self_lookup`, `find_closest_nodes_network` parallel-query block, and `send_dht_request` fallback so a stale NAT binding on entry #1 no longer kills the dial when entry #2 would have worked. ## Regression tests `tests/dht_self_advertisement.rs` adds six `#[tokio::test]` cases that pin the published self-entry behaviour: - **Loopback bind path** (single-node, `local: true`): - `local_mode_publishes_dialable_loopback_address` - `local_mode_published_self_entry_matches_runtime_listen_addrs` - `local_mode_never_publishes_port_zero` - **Wildcard / OBSERVED_ADDRESS path** (`local: false`): - `wildcard_bind_with_no_peers_publishes_empty_self_entry` — pins the "don't lie when you don't know" contract. - `wildcard_bind_publishes_observed_address_after_peer_connection` — primary regression test for the OBSERVED_ADDRESS flow. - `observed_address_cache_serves_fallback_after_connection_drop` — pins the cache fallback behaviour after all live connections drop. All six FAIL on `rc-2026.4.1` with concrete evidence (e.g. `[Quic(0.0.0.0:0)]` published as the only address) and PASS here. `ObservedAddressCache` ships with 9 unit tests covering empty, single, repeated, frequency, recency tiebreak, NAT-rebinding fallback, long-quiet fallback, eviction, and re-observation-on-full. ## chore: drop needless struct update in IPDiversityConfig test Both fields of `IPDiversityConfig` are explicitly set, so the trailing `..IPDiversityConfig::default()` triggers `clippy::needless_update` under `cargo clippy --all-targets`. Test code only; no functional change. ## Verification - `cargo build --lib`: clean - `cargo fmt --check`: clean - `cargo clippy --lib -- -D warnings -D clippy::unwrap_used -D clippy::expect_used`: clean - `cargo clippy --all-targets -- -D warnings`: clean - `cargo test --lib`: 282 passed, 0 failed - `cargo test --test dht_self_advertisement`: 6/6 passed - `cargo test --test two_node_messaging`: 7/7 passed - `cargo test --test node_lifecycle`: 17/17 passed This is one of two PRs that split the original PR #70, which bundled this NAT-traversal hardening with two 1000-node scale fixes in the DHT touch path and the inbound message dispatcher. The other PR covers the 1000-node perf work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jacderida
approved these changes
Apr 6, 2026
This was referenced Apr 6, 2026
3 tasks
mickvandijke
added a commit
that referenced
this pull request
Apr 14, 2026
The classification session was sending raw listen_addrs (potentially 0.0.0.0) as candidate addresses in DialBackRequest messages. Probers could not dial wildcard addresses, so all probes failed and every node was classified as private regardless of actual reachability. Now uses transport.observed_external_addresses() as the primary source (real external IP from QUIC OBSERVED_ADDRESS frames). Falls back to listen_addrs filtered to non-wildcard IPs only when no observations are available yet (cold start before any peer has connected). This mirrors the fix in local_dht_node() from PR #70. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles NAT-traversal correctness fixes, a new observed-address cache, and two perf fixes targeting 1000-node routing contention, all against the
rc-2026.4.1release branch.NAT traversal & dialable addresses
Root cause
DhtNetworkManager::local_dht_node()built its self-entry from the staticNodeConfig::listen_addrs()derivation, which returns wildcard IPs (0.0.0.0/::) and the configured port (0for ephemeral binds). Receivers'dialable_addresses()filter then rejected the entry, so DHT-based peer discovery returned zero contactable addresses for any node. Connections only succeeded via side channels (static bootstrap, in-memory cache, relay fallback) — exactly the sporadic-failure pattern reported bysaorsa-node-lmdb.Fix
local_dht_node()is now anasync fnthat sources its self-entry addresses fromTransportHandle::observed_external_address()— the OBSERVED_ADDRESS reflexive address reported by peers — with the new observed-address cache as a fallback for the window immediately after a connection drop. The earlier iteration added aUdpSocket::connect-then-getsocknamehelper (primary_local_ip) to substitute wildcard IPs, but that was dropped in a follow-up commit because it ran blocking syscalls inside an async context; the OBSERVED_ADDRESS-only flow supersedes it and needs no interface probing.Other NAT fixes bundled
bootstrap_from_peersnow records the queried bootstrap peer as the hole-punch coordinator referrer for every node it returned, completing thedial_candidate → set_hole_punch_preferred_coordinatorchain the iterative-lookup path already used.tokio::select!overTransportHandle::recv_relay_established()/recv_peer_address_update()instead of the previous 1-second polling loop, eliminating the race window where outbound DHT queries advertised no relay address for up to a second after relay setup.IDENTITY_EXCHANGE_TIMEOUTandBOOTSTRAP_IDENTITY_TIMEOUT_SECSare both now 15 s (previously 5 s). Identity exchange (two RTTs + ML-DSA-65 signature, ~3.3 kB) blew through 5 s sporadically on congested cellular and cross-region links. Review noted the initial commit only bumped the bootstrap path; the non-bootstrapsend_dht_requestpath is now on the same 15 s budget.dial_addresses()loops through every dialable address until one succeeds. Applied at 5 call sites (bootstrap_from_peers,spawn_bucket_refresh_task,trigger_self_lookup,find_closest_nodes_networkparallel-query block,send_dht_requestfallback) so a stale NAT binding on entry style: Fix formatting issues to resolve CI failures #1 no longer kills the dial when entry Comparison between saorsa and libp2p or iroh #2 would have worked.Observed-address cache (new)
src/transport/observed_address_cache.rs(new, 372 lines) records OBSERVED_ADDRESS reflexive addresses against the local socket they were reported on, with a TTL and recency-based eviction. The cache is consulted byTransportHandle::observed_external_address()when no live connection currently carries a fresh reflexive address, so a node whose upstream connection just dropped can still publish a dialable self-entry for the few seconds before a new connection re-reports the address. Without this fallback the fix above degrades to "works only while you already have at least one live peer with recent OBSERVED_ADDRESS traffic."1000-node scaling
Two independent contention points identified in 1000-node testnet traces where responses were missing the 25 s caller timeout (
[STEP 6 FAILED]/Response channel closed (receiver timed out)cascade):perf(dht): atomiclast_seenunder a read lock — routing-tabletouchwas called on every inbound DHT message and always took aRwLock::write(), serialising every read behind each update.NodeInfo.last_seenis now anAtomicInstantand there is a fast path that atomically bumps the timestamp under a read lock when the address merge would be a no-op (address absent, already present, or skipped by the loopback-injection rule). The write-lock slow path is still used when a real address merge is needed.perf(transport): shard the inbound message dispatcher — the previous single consumer task took three async write locks plus an awaited peer-id registration per message, so messages queued behind each other instead of being processed in parallel. The receive loop is now a light dispatcher that hashes the sourceIpAddrand hands off to one of 8 parallel shard consumers, each running the full parse/register/broadcast pipeline independently. Hashing byIpAddr(not fullSocketAddr) keeps a peer pinned to the same shard across ephemeral-port reconnects so per-peer message ordering is preserved.Regression tests
tests/dht_self_advertisement.rs(new, 649 lines) pins the published self-entry behaviour with three cases covering public-mode wildcard, port match, and local-mode ephemeral bind. All three FAILED onrc-2026.4.1with concrete evidence (e.g.[Quic(0.0.0.0:0)]published as the only address; transport bound to port 55886 but DHT advertised port 0). All pass after this branch. They use only public APIs (DhtNetworkManager::find_closest_nodes_local_with_self) so no new test surface area was added to the library.Out of scope
saorsa-transport'snat_traversal_api.rsis fixed in a separate commit in that crate.--port 0default inant-noderemains a deliberate operator-config decision.Commits
08febccfix: advertise dialable addresses to DHT and harden NAT traversal4aeecf6chore: drop needless struct update in IPDiversityConfig testa528decfix: bump IDENTITY_EXCHANGE_TIMEOUT to 15s for slow-link parity26eb128fix: drop primary_local_ip and rely on OBSERVED_ADDRESS for self-entry7086ccbfeat: cache observed external addresses as fallback for dropped connections0a854cdperf(dht): atomic last_seen to run touch under a read lockcf1f32dperf(transport): shard inbound message dispatcher by source IPTest plan
cargo test --libcargo test --test dht_self_advertisement— 3/3 passing (new regression suite)cargo test --test two_node_messagingcargo test --test node_lifecyclecargo clippy --lib -- -D warnings— cleancargo test --liband clippy after the twoperf:commitssaorsa-2.saorsalabs.com:10000/saorsa-3.saorsalabs.com:10000bootstrap nodes[STEP 6 FAILED]/Response channel closedcascadeGreptile Summary
This PR bundles a well-scoped set of fixes and performance improvements targeting DHT self-advertisement correctness, NAT traversal reliability, and routing-table write-lock contention at 1000-node scale.
Core changes:
local_dht_node()is nowasyncand sources self-entry addresses exclusively fromTransportHandle::observed_external_addresses()(live QUIC OBSERVED_ADDRESS frames) and the runtime-bound listen address set, discarding the priorNodeConfig::listen_addrs()call that returned un-dialable wildcards/port-0 valuesObservedAddressCache(src/transport/observed_address_cache.rs) fills the "invisible window" between connection drop and reconnection with frequency+recency-aware address selection; multi-homed partitioned by(local_bind, observed)keybootstrap_from_peersnow passes the bootstrap peer's socket address as hole-punch coordinator referrer to everydial_addressescallIDENTITY_EXCHANGE_TIMEOUTandBOOTSTRAP_IDENTITY_TIMEOUT_SECSunified at 15 s (was 5 s) to handle congested cellular / cross-region linksNodeInfo.last_seenpromoted toAtomicInstant(parking_lot::Mutex<Instant>) so the hottouchpath can run under a read lock on the routing table instead of taking a write lock on every inbound DHT messagetokio::select!overrecv_relay_established/recv_peer_address_updatetests/dht_self_advertisement.rs(649 lines) that confirmed the exact failure modes on the base branch and passes cleanly on this branchConfidence Score: 5/5
Safe to merge — all remaining findings are P2 style suggestions with no production correctness impact
No P0 or P1 issues found. The two P2 comments flag a write-lock optimization opportunity in the shard consumer and the use of DefaultHasher (which is deterministic within a run and carries no correctness risk for a load-balancing shard key). The root-cause fixes are correct, the new ObservedAddressCache is well-designed and thoroughly tested, the regression suite concretely demonstrates the pre-fix failures, and the atomic touch path is a valid read-lock optimization with proper fallback. Prior concern about blocking I/O in local_dht_node() is fully resolved by the OBSERVED_ADDRESS-only approach.
src/transport_handle.rs — the shard consumer write-lock pattern is worth a follow-up optimization once the scalability impact is measured in production
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Inbound QUIC message] --> B[upstream_rx channel] B --> C{shard_index_for_addr\nhash source IP} C -->|shard 0| D0[Shard Consumer 0] C -->|shard 1| D1[Shard Consumer 1] C -->|...| D2[...] C -->|shard 7| D7[Shard Consumer 7] D0 & D1 & D2 & D7 --> E[parse_protocol_message] E --> F{authenticated?} F -->|yes| G[peer_to_channel.write\nchannel_to_peers.write\nregister_connection_peer_id] F -->|no| H[broadcast P2PEvent] G --> I{is /rr/ response?} I -->|yes| J[active_requests.write\ndeliver to oneshot] I -->|no| H subgraph NAT [NAT Self-Entry Fix] K[local_dht_node async] --> L{observed_external\naddresses?} L -->|yes| M[publish observed addr\nper interface] L -->|no| N[ObservedAddressCache\nfallback] N --> O{specific IP\nlisten addr?} O -->|yes| P[publish listen addr] O -->|no| Q[publish nothing —\nhonest empty entry] endPrompt To Fix All With AI
Greploops — Automatically fix all review issues by running
/greploopsin Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.
Reviews (2): Last reviewed commit: "fix: address deep-review findings on PR ..." | Re-trigger Greptile