Skip to content

fix(discv5): use Weak reference in kbuckets bg task to release port on shutdown#23282

Merged
mattsse merged 5 commits into
paradigmxyz:mainfrom
romanbrodetski-ai:fix/discv5-shutdown-port-release
Apr 24, 2026
Merged

fix(discv5): use Weak reference in kbuckets bg task to release port on shutdown#23282
mattsse merged 5 commits into
paradigmxyz:mainfrom
romanbrodetski-ai:fix/discv5-shutdown-port-release

Conversation

@romanbrodetski-ai
Copy link
Copy Markdown
Contributor

@romanbrodetski-ai romanbrodetski-ai commented Mar 30, 2026

Summary

The spawn_populate_kbuckets_bg function holds an Arc<discv5::Discv5> which prevents the discv5 node from being fully dropped when the Discv5 handle is dropped. This causes the UDP port to remain bound, making it impossible to restart a node on the same port without waiting for the OS to reclaim it.

This PR:

  • Changes spawn_populate_kbuckets_bg to accept Weak<discv5::Discv5> instead of Arc<discv5::Discv5>
  • The background task now upgrades the weak reference on each iteration and gracefully exits when the Discv5 node is dropped
  • Adds a test (discv5_releases_port_on_drop) that verifies the port is released after dropping the node

@RomanBrodetski
Copy link
Copy Markdown

RomanBrodetski commented Apr 13, 2026

@mattsse just a gentle ping on this one for feedback. If there are reasons you decide against merging this - also fine!
Thanks,
Roman

Copy link
Copy Markdown
Contributor

@DaniPopes DaniPopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, lgtm, nits

Comment thread crates/net/discv5/src/lib.rs Outdated
drop(updates);
drop(node);

tokio::task::yield_now().await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be flaky since it doesn't guarantee to run the entire destructor

Comment thread crates/net/discv5/src/lib.rs Outdated
// make many fast lookup queries at bootstrap, trying to fill kbuckets at furthest
// log2distance from local node
for i in (0..bootstrap_lookup_countdown).rev() {
let Some(discv5_handle) = discv5.upgrade() else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both of these upgrades are held across sleep calls, might not matter much tho

@github-project-automation github-project-automation Bot moved this from Backlog to In Progress in Reth Tracker Apr 24, 2026
…n shutdown

The `spawn_populate_kbuckets_bg` function held an `Arc<discv5::Discv5>`
which prevented the discv5 node from being fully dropped when the
`Discv5` handle was dropped. This caused the UDP port to remain bound,
making it impossible to restart a node on the same port without waiting
for the OS to reclaim it.

Switch to `Weak<discv5::Discv5>` so the background task gracefully exits
when the last strong reference is dropped, allowing the port to be
released immediately.
@romanbrodetski-ai romanbrodetski-ai force-pushed the fix/discv5-shutdown-port-release branch from ffc569a to 453cd55 Compare April 24, 2026 10:58
@RomanBrodetski
Copy link
Copy Markdown

@DaniPopes thanks for the review - I addressed the nits and - hopefully - resolved the CI. Please approve the CI run and have an another look 🙌

Copy link
Copy Markdown
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entire disc5 start code is a bit of a mess -.-
using a weak here is fine at this point

@RomanBrodetski
Copy link
Copy Markdown

RomanBrodetski commented Apr 24, 2026

Thanks a lot for the approval @mattsse !! I merged main to this branch and need another CI approval to merge I think..

@mattsse mattsse enabled auto-merge April 24, 2026 16:42
@mattsse mattsse added this pull request to the merge queue Apr 24, 2026
@mattsse
Copy link
Copy Markdown
Collaborator

mattsse commented Apr 24, 2026

thank you @RomanBrodetski

Merged via the queue into paradigmxyz:main with commit 41c6872 Apr 24, 2026
34 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Reth Tracker Apr 24, 2026
RomanBrodetski added a commit to matter-labs/zksync-os-server that referenced this pull request Apr 27, 2026
## Summary
- use reth fork that has a fix to properly release ports. Agreed with
@itegulov on this. PR against main repo:
paradigmxyz/reth#23282 (unfortunately no
acknowledgement from maintainers for now)
- refactor the integration test harness around explicit environment ->
config -> launch flow
- unify node launch around config derivation for external nodes and
restarts
- trim redundant harness API surface and simplify gateway/multi-node
ownership

## Testing
- cargo test -p zksync_os_integration_tests --no-run
- cargo nextest run -p zksync_os_integration_tests --profile no-pig

---------

Co-authored-by: romanbrodetski-ai <romanbrodetski-ai@users.noreply.github.com>
umeeSthein pushed a commit to umeeSthein/tempo that referenced this pull request Apr 30, 2026
Automated nightly update of reth dependencies from `paradigmxyz/reth`
main branch.

## Upstream reth changes


[`7839f3d...73ec2c9`](paradigmxyz/reth@7839f3d...73ec2c9)

🔗 Amp thread:
https://ampcode.com/threads/T-019dd89b-f659-75eb-a3f2-3e769d63297f
**Engine**
- Spawn BAL hashed state before storage prefetch
([#23761](paradigmxyz/reth#23761)) and disable
BAL parallel execution by default
([#23764](paradigmxyz/reth#23764))
- Add CLI flag to disable BAL storage prefetch
([#23770](paradigmxyz/reth#23770))
- Log transient invalid header cache skips
([#23711](paradigmxyz/reth#23711))
- Add `DecodedBal` in `ExecutionEnv`
([#23675](paradigmxyz/reth#23675))

**RPC**
- Clean up eth state cache reorg entries
([#23683](paradigmxyz/reth#23683))
- Pass block timestamp to txn
([#23700](paradigmxyz/reth#23700))
- Include block numbers in `BlockRangeExceedsHead` error
([#23720](paradigmxyz/reth#23720))

**Network / P2P**
- Add Basic in-memory BAL store and BAL request to block access list
requests ([#23682](paradigmxyz/reth#23682),
[#23710](paradigmxyz/reth#23710))
- Enforce BAL response soft limit and apply count cap to handler
([#23725](paradigmxyz/reth#23725),
[#23754](paradigmxyz/reth#23754))
- Avoid RLP-decoding `NewBlock` payloads
([#23712](paradigmxyz/reth#23712)) and bound
memory footprint of p2p messages
([#23718](paradigmxyz/reth#23718))
- Memory-bound channel between network and tx manager
([#23802](paradigmxyz/reth#23802))
- Track unknown tx types in announcement metrics
([#23688](paradigmxyz/reth#23688))
- Retain active session buffer capacity
([#23702](paradigmxyz/reth#23702)) and respect
peer requirements for fetch followups
([#23706](paradigmxyz/reth#23706))
- Support binding discv5 and discv4 to the same port
([#23613](paradigmxyz/reth#23613)) and use Weak
reference in discv5 kbuckets bg task to release port on shutdown
([#23282](paradigmxyz/reth#23282))

**Trie**
- Skip DB seek on exact overlay hits
([#23559](paradigmxyz/reth#23559))
- Account for heap-allocated blinded hashes in `SparseNode::memory_size`
([#23726](paradigmxyz/reth#23726))

**DB / Storage**
- `reth db migrate-v2` for pruned nodes
([#23716](paradigmxyz/reth#23716))
- Pass `ExecutedBlocks` to `OverlayBuilder`, reduce reverts queried
([#23657](paradigmxyz/reth#23657))
- Add empty request check to storage values
([#23714](paradigmxyz/reth#23714))
- Derive `Eq` for `IntegerList`
([#23709](paradigmxyz/reth#23709)) and reorder
unix deps after strum
([#23697](paradigmxyz/reth#23697))

**Provider / EVM**
- Use overlay builders in historical state paths
([#23667](paradigmxyz/reth#23667))
- Return gas output from block builder
([#23744](paradigmxyz/reth#23744))
- Expose executor transaction result type
([#23759](paradigmxyz/reth#23759))

**Perf**
- Configurable rocksdb block cache size and re-use of mdbx provider in
re-execute ([#23701](paradigmxyz/reth#23701))
- Enable revm `p256-aws-lc-rs` feature
([#23721](paradigmxyz/reth#23721))
- Add platform-specific RUSTFLAGS to Dockerfile
([#23738](paradigmxyz/reth#23738))

**Bench**
- Add reorg mode to `new-payload-fcu`
([#23666](paradigmxyz/reth#23666))
- Enable `keccak-cache-global` in `reth-bb`
([#23723](paradigmxyz/reth#23723)) and fix
multi-executor support
([#23763](paradigmxyz/reth#23763))
- Dedupe merged BAL storage reads
([#23758](paradigmxyz/reth#23758))

**Re-execute**
- Verify reverts against changesets
([#23717](paradigmxyz/reth#23717))

**Payload**
- Track Amsterdam block gas in builders
([#23743](paradigmxyz/reth#23743))

**CLI**
- Use `TxTy`/`ReceiptTy` for static-file db get
([#23692](paradigmxyz/reth#23692)) and node
types in execution stage dump
([#23705](paradigmxyz/reth#23705))
- Preserve `trusted_nodes_only` from config when `--trusted-only` is
unset ([#23703](paradigmxyz/reth#23703))
- Avoid u64 underflow in `setup_without_evm` for genesis header
([#23728](paradigmxyz/reth#23728))

**Testing**
- Add BAL request e2e coverage
([#23727](paradigmxyz/reth#23727))
- Cover admin node info discv5 port
([#23781](paradigmxyz/reth#23781))

**Misc**
- Bump reth core to v0.3.1
([#23707](paradigmxyz/reth#23707)) and alloy-evm
to 0.33.3 ([#23778](paradigmxyz/reth#23778))
- Align ERA1 export with spec
([#23693](paradigmxyz/reth#23693))

## Migrations

🔗 Amp thread:
https://ampcode.com/threads/T-019dd89c-63d3-752f-ac22-7d4aef2ac3e3
- **Bumped reth git rev** from `7839f3d` to `73ec2c9` across all
`reth-*` workspace dependencies in
[Cargo.toml](file:///home/runner/work/tempo/tempo/Cargo.toml), and
bumped `reth-primitives-traits` from `0.3.0` to `0.3.1` and `alloy-evm`
from `0.33.0` to `0.34.0` to track upstream releases.
- **Switched p2p proxy transactions channel** from
`tokio::sync::mpsc::unbounded_channel` to
`reth_metrics::common::mpsc::memory_bounded_channel` with a 32 MiB
budget in
[bin/tempo/src/p2p_proxy.rs](file:///home/runner/work/tempo/tempo/bin/tempo/src/p2p_proxy.rs),
matching reth's new default `tx_channel_memory_limit_bytes` to prevent
unbounded memory growth; added `reth-metrics` to
[bin/tempo/Cargo.toml](file:///home/runner/work/tempo/tempo/bin/tempo/Cargo.toml).
- **Changed `BlockExecutor::commit_transaction` signature** in
[crates/evm/src/block.rs](file:///home/runner/work/tempo/tempo/crates/evm/src/block.rs)
from returning `Result<GasOutput, BlockExecutionError>` to returning
`GasOutput` directly (upstream removed the fallible return); replaced
the missing-tx error with an `expect`, and updated all tests to drop
`.unwrap()`.
- **Added `Send + 'static` bound to `TxResult` impl** for
`TempoTxResult<H>` to satisfy the new upstream trait bounds on
associated execution-result types.
- **Exposed `TempoBlockExecutor` and `TempoTxResult` as `pub`**
(previously `pub(crate)`) and re-exported them from
[crates/evm/src/lib.rs](file:///home/runner/work/tempo/tempo/crates/evm/src/lib.rs)
so they can be named in the new associated types.
- **Added associated types `TxExecutionResult` and `Executor` to
`BlockExecutorFactory` impl** for `TempoEvmConfig`, and changed
`create_executor` to return the concrete `Self::Executor<'a, DB, I>`
instead of `impl BlockExecutorFor<'a, Self, DB, I>`, tracking upstream's
move from RPIT to named associated types; dropped the now-unused
`BlockExecutorFor` import and the `'a` lifetime bounds on `DB`/`I`.
- **Implemented new `HeaderMut` methods** (`set_mix_hash`,
`set_extra_data`, `set_parent_beacon_block_root`) for `TempoHeader` in
[crates/primitives/src/reth_compat/header.rs](file:///home/runner/work/tempo/tempo/crates/primitives/src/reth_compat/header.rs)
to satisfy the expanded upstream trait, delegating to the inner Ethereum
header.

[GitHub
Workflow](https://github.com/tempoxyz/tempo/actions/runs/25101489247)

---------

Co-authored-by: Centaur AI <ai@centaur.local>
Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: klkvr <klkvrr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants