perf: collect-based fast path for small block tx iterator by yongkangc · Pull Request #22078 · paradigmxyz/reth

yongkangc · 2026-02-11T18:40:10Z

Summary
Reth spawns the same heavy parallel tx conversion pipeline regardless of block size: rayon for_each_with sends results out-of-order through an intermediate channel, then a second spawn_blocking task reorders via BTreeMap. For small blocks (<30 txs), this coordination overhead exceeds the benefit. This PR adds a collect-based fast path using par_iter().map().collect() which preserves order natively, eliminating the intermediate channel and reorder task. For large blocks, replaces BTreeMap with pre-allocated Vec<Option<_>> for O(1) reorder.

Changes

Add PARALLEL_REORDER_TX_THRESHOLD (30 txs)
Small blocks: single rayon::spawn with collect() + sequential channel sends
Large blocks: Vec<Option<_>> reorder buffer instead of BTreeMap

For blocks with < 128 transactions, use rayon's order-preserving collect() instead of the out-of-order channel + BTreeMap reorder pipeline. This eliminates a spawn_blocking task and BTreeMap allocations for the common small block case. For larger blocks, replace BTreeMap reorder with pre-allocated Vec<Option<_>> buffer for O(1) insert and lookup. Amp-Thread-ID: https://ampcode.com/threads/T-019c4d83-74c0-739b-84aa-6437db8d213a

github-actions · 2026-02-11T18:40:52Z

⚠️ Changelog not found.

A changelog entry is required before merging. We've generated a suggested changelog based on your changes:

Preview

---
reth-engine-tree: patch
---

Optimized transaction iterator with fast-path for small blocks. Added a threshold-based approach that uses order-preserving `collect()` for blocks with fewer than 30 transactions, eliminating unnecessary channel overhead and reordering logic for common cases.

Add changelog to commit this to your branch.

yongkangc

Review: perf: collect-based fast path for small block tx iterator

Correctness

1. Prewarming latency regression on the small-block path (blocking)

The par_iter().map().collect() path delays all prewarm sends until every transaction conversion completes. In the current code, prewarm_tx.send() happens immediately inside for_each_with as each conversion finishes, so the prewarm task can start executing speculatively while other conversions are still running.

This matters most for blocks with 50–127 transactions (above SMALL_BLOCK_TX_THRESHOLD so prewarm is enabled, but below the new 128 threshold so the collect path runs). These blocks will get zero prewarm overlap with conversion, making prewarming less effective or useless. The current streaming design is intentionally pipelined for this reason.

Suggestion: Keep the prewarm sends streaming (inside the map closure or via for_each_with) even if the execution stream ordering is optimized. Prewarming order does not matter.

2. Silent transaction drop in the large-block path (blocking)

} else if idx < buf.len() {
    buf[idx] = Some(tx);
}

If idx >= buf.len(), the transaction is silently dropped. While enumerate() on an IndexedParallelIterator of known length should never produce an out-of-range index, silently discarding a transaction is a critical failure mode: the execute_rx consumer will block forever waiting for a transaction that was dropped, or the block will be built with missing transactions. This should at minimum be a debug_assert! / panic, not a silent discard.

// Suggested fix:
debug_assert!(idx < buf.len(), "transaction index {idx} out of range for buffer length {}", buf.len());
buf[idx] = Some(tx);

Performance

3. Vec<Option<Result<…>>> pre-allocation overhead

The Vec is pre-allocated for all transactions:

let mut buf: Vec<Option<Result<_, _>>> = (0..tx_count).map(|_| None).collect();

The BTreeMap only ever holds the out-of-order tail — typically bounded by worker parallelism (rayon thread count), not by block size. For a block with 5000 transactions, this allocates and zeroes 5000 slots of Option<Result<WithTxEnv<…>, Err>> (which are not small), while the BTreeMap would typically hold <64 entries at any time.

The Vec is O(1) lookup vs O(log n) for BTreeMap, but with the working set bounded by parallelism, the BTreeMap overhead is negligible while memory usage stays constant.

Suggestion: Benchmark before switching. If the Vec approach is pursued, at least use Box<Result<…>> per slot to reduce the per-slot size to a pointer.

Nits

4. PARALLEL_REORDER_TX_THRESHOLD as an associated constant

Defining it as const PARALLEL_REORDER_TX_THRESHOLD: usize = 128; inside the impl block is fine, but it would be more consistent with SMALL_BLOCK_TX_THRESHOLD (module-level pub const) to also define this at module level with a doc comment explaining the rationale and how the value was chosen.

5. Blank line left in imports

 use std::{
-    collections::BTreeMap,
+
     ops::Not,

The removal leaves a blank line inside the use block.

6. Missing benchmark data

PR body says "Benchmark pending." Given this is a perf PR, it is hard to evaluate whether the threshold of 128 or the collect-vs-streaming tradeoff is net positive without data. Would recommend running reth-bench before merging.

Verdict

Changes requested. The prewarming latency regression (#1) and the silent transaction drop (#2) are blocking issues. The performance claim needs benchmark validation (#6). The core idea of optimizing the reorder buffer is reasonable but the current implementation introduces regressions in the prewarming pipeline that should be preserved.

Amp-Thread-ID: https://ampcode.com/threads/T-019c4e1a-91e9-75dd-88e7-003fbbf31c37

- Remove blank line in std use block (fmt) - Collapse single-line vec init (fmt) - Add backticks around BTreeMap in doc comment (clippy) Amp-Thread-ID: https://ampcode.com/threads/T-019c4e7e-4c60-7228-a6e8-e24c9bdb0aa4

yongkangc · 2026-02-11T21:19:33Z

Closing — the small-block collect-based path introduces a barrier that serializes the convert→prewarm→execute pipeline: no prewarm or execution sends happen until ALL conversions finish, which is a latency regression vs the current streaming design. The Vec<Option<_>> reorder for large blocks is directionally fine but marginal. No benchmark data was provided to show a measurable improvement. If the goal is reducing per-block coordination overhead for small blocks, a better approach would be sequential convert-and-send (no rayon, no reorder thread) while preserving streaming, rather than a collect barrier.

yongkangc added C-perf A change motivated by improving speed, memory usage or disk footprint A-engine Related to the engine implementation labels Feb 11, 2026

github-project-automation bot added this to Reth Tracker Feb 11, 2026

github-project-automation bot moved this to Backlog in Reth Tracker Feb 11, 2026

yongkangc commented Feb 11, 2026

View reviewed changes

Ubuntu added 2 commits February 11, 2026 20:12

chore: lower tx iterator fast-path threshold to 30

14e32ac

Amp-Thread-ID: https://ampcode.com/threads/T-019c4e1a-91e9-75dd-88e7-003fbbf31c37

fix: resolve fmt and clippy CI failures

e4b93a2

- Remove blank line in std use block (fmt) - Collapse single-line vec init (fmt) - Add backticks around BTreeMap in doc comment (clippy) Amp-Thread-ID: https://ampcode.com/threads/T-019c4e7e-4c60-7228-a6e8-e24c9bdb0aa4

yongkangc closed this Feb 11, 2026

github-project-automation bot moved this from Backlog to Done in Reth Tracker Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: collect-based fast path for small block tx iterator#22078

perf: collect-based fast path for small block tx iterator#22078
yongkangc wants to merge 3 commits intomainfrom
yk/small-block-tx-iter-fastpath

yongkangc commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

yongkangc left a comment

Uh oh!

yongkangc commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yongkangc commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Changelog not found.

Uh oh!

yongkangc left a comment

Choose a reason for hiding this comment

Review: perf: collect-based fast path for small block tx iterator

Correctness

Performance

Nits

Verdict

Uh oh!

yongkangc commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yongkangc commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading