Skip to content

feat(engine): backpressure, take 2.#23280

Merged
pepyakin merged 8 commits into
mainfrom
pep/backpressure-2
Mar 30, 2026
Merged

feat(engine): backpressure, take 2.#23280
pepyakin merged 8 commits into
mainfrom
pep/backpressure-2

Conversation

@pepyakin
Copy link
Copy Markdown
Member

@pepyakin pepyakin commented Mar 30, 2026

This PR adds backpressure handling to the engine tree when persistence falls behind. Motivation is, when the node is node able to persist the blocks as fast as they arrive, then it should slowdown the admission of the blocks in the first place.

When the gap between the canonical tip and the last persisted block exceeds --engine.persistence-backpressure-threshold (default: 16 blocks), the main loop stops reading from the engine channel and blocks on the in-flight persistence task instead.

This shifts buffering from the costlier persistence pipeline (where each block carries trie updates and heavy state) to the lighter-weight incoming engine channel (where payloads sit as raw messages). The engine API has no backpressure semantics and standard CLs will timeout and resend after ~8s, so this can't shrink the inbound queue under sustained load but a) it prevents the expensive downstream queue from growing unboundedly and b) allows clients that do care about backpressure to handle it properly.

This PR supersedes #23244. This takes a simpler approach. We take the L and buffer the payload messages in the channel. This yields a simpler implementation but sacrifies some fidelity of metrics.

Observability

Two metrics are exposed under consensus.engine.beacon:

  • backpressure_active (gauge): 1.0 when the engine loop is stalled waiting on persistence, 0.0 otherwise.
  • backpressure_stall_duration (histogram): wall-clock time spent blocked on each persistence wait. Useful for understanding how long the engine loop is paused per stall event.

Flag interaction

--engine.persistence-backpressure-threshold (default: 16) must be strictly greater than --engine.persistence-threshold (default: 2). If you increase the persistence threshold, you likely need to bump the backpressure threshold as well — otherwise the gap between them shrinks and backpressure kicks in too eagerly. The node will refuse to start if the backpressure threshold is not greater than the persistence threshold.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

✅ Changelog found on PR.

Edit changelog

@pepyakin pepyakin force-pushed the pep/backpressure-2 branch from 916d110 to bbc8cd0 Compare March 30, 2026 13:29
@mediocregopher
Copy link
Copy Markdown
Member

derek bench

@decofe
Copy link
Copy Markdown
Member

decofe commented Mar 30, 2026

cc @mediocregopher

✅ Benchmark complete! View job

Benchmark Results

Metric main pep/backpressure-2 Change
Mean 21.69ms 21.74ms +0.25% ⚪ (±0.41%)
StdDev 14.79ms 14.77ms
P50 18.78ms 18.88ms +0.55% ⚪ (±1.46%)
P90 32.73ms 32.69ms -0.12% ⚪ (±3.40%)
P99 89.35ms 88.76ms -0.66% ⚪ (±2.27%)
Mgas/s 1475.87 1472.02 -0.26% ⚪ (±0.46%)
Wall Clock 22.37s 22.40s +0.15% ⚪ (±0.45%)

500 blocks

Wait Time Breakdown

Persistence Wait

Metric main pep/backpressure-2
Mean 65.25ms 65.88ms
P50 0.00ms 0.00ms
P95 240.67ms 246.70ms

Trie Cache Update Wait

Metric main pep/backpressure-2
Mean 0.11ms 0.10ms
P50 0.00ms 0.00ms
P95 0.57ms 0.50ms

Execution Cache Update Wait

Metric main pep/backpressure-2
Mean 0.00ms 0.00ms
P50 0.00ms 0.00ms
P95 0.00ms 0.00ms

Grafana Dashboard

Charts

Latency, Throughput & Diff

Latency, Throughput & Diff

Wait Time Breakdown

Wait Time Breakdown

Gas vs Latency

Gas vs Latency

Grafana Dashboard

View real-time metrics

@github-project-automation github-project-automation Bot moved this from Backlog to In Progress in Reth Tracker Mar 30, 2026
@pepyakin pepyakin marked this pull request as ready for review March 30, 2026 14:09
@pepyakin pepyakin added this pull request to the merge queue Mar 30, 2026
Merged via the queue into main with commit 930f2a6 Mar 30, 2026
35 checks passed
@pepyakin pepyakin deleted the pep/backpressure-2 branch March 30, 2026 15:32
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Reth Tracker Mar 30, 2026
Copy link
Copy Markdown
Contributor

@yongkangc yongkangc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sick stuff

github-merge-queue Bot pushed a commit to tempoxyz/tempo that referenced this pull request Mar 31, 2026
Automated nightly update of reth dependencies from `paradigmxyz/reth`
main branch.

## Upstream reth changes


[`7f4a9a0...f0d07c3`](paradigmxyz/reth@7f4a9a0...f0d07c3)

🔗 Amp thread:
https://ampcode.com/threads/T-019d4215-b76f-7663-b8f3-8fde1da1165c
**Engine**
- Share execution cache and sparse trie pipeline with payload builder
([#23242](paradigmxyz/reth#23242),
[#23246](paradigmxyz/reth#23246))
- Add method to get payload resolve future
([#23256](paradigmxyz/reth#23256))
- Backpressure, take 2
([#23280](paradigmxyz/reth#23280))
- Return -38003 for FCUv2 payloadAttributes mismatch
([#22924](paradigmxyz/reth#22924))
- Fix double decrement in account cache size
([#23249](paradigmxyz/reth#23249))

**Trie**
- Call root before prune
([#23243](paradigmxyz/reth#23243))
- Use Entry API in `MultiProofTargets::extend_inner`
([#23247](paradigmxyz/reth#23247))
- Record trie cursor metrics
([#23252](paradigmxyz/reth#23252))
- Add `SparseStateTrie::update_account_stateless` for stateless
validation ([#23272](paradigmxyz/reth#23272))

**Networking**
- Prefer peer-reported block number in session activation
([#23275](paradigmxyz/reth#23275))
- Resolve DNS for ExternalAddr in `external_addr_with`
([#23269](paradigmxyz/reth#23269))

**Consensus**
- Retry block subscription on initial connection failure
([#23233](paradigmxyz/reth#23233))

**Refactor**
- Remove OP `ExecutionPayload` impl and op feature from
payload-primitives
([#23253](paradigmxyz/reth#23253))
- Remove OP `PayloadAttributesBuilder` impl and op feature from
engine-local ([#23255](paradigmxyz/reth#23255))
- Relax RPC converter impls
([#23254](paradigmxyz/reth#23254))

**Perf / Bench**
- Use `FastInstant` for remaining metrics timing
([#23265](paradigmxyz/reth#23265))
- Add hourly main regression bench
([#23219](paradigmxyz/reth#23219))

**DB**
- Add `create_test_provider_factory_with_chain_spec_and_db_args`
([#23270](paradigmxyz/reth#23270))

**CLI**
- Add more WARN logging before download retries
([#23258](paradigmxyz/reth#23258))
- Use `HeaderTy` for stage dump headers
([#23274](paradigmxyz/reth#23274))

**Testing**
- Add regression test for parked basefee ancestor handling
([#23277](paradigmxyz/reth#23277))

**Deps**
- Bump alloy 1.8.2, alloy-evm, lighthouse v8.1.3
([#23241](paradigmxyz/reth#23241),
[#23289](paradigmxyz/reth#23289),
[#23239](paradigmxyz/reth#23239))
- Weekly `cargo update`
([#23267](paradigmxyz/reth#23267))

## Migrations

🔗 Amp thread:
https://ampcode.com/threads/T-019d4215-e6e7-7542-acaa-239d595b2916
- **Reth dependency bump**: All `reth-*` crates updated from rev
`7f4a9a0` to `f0d07c3`; `alloy-evm` bumped from `0.29.2` to `0.30.0`
- **`reth-etl` reordered**: Moved from its previous position to sit next
to `reth-trie` (no functional change)
- **`tempo-contracts` feature dropped**: Removed default `features =
["serde"]` from workspace dependency
- **`TransactionEnv` → `TransactionEnvMut` rename**:
`reth_evm::TransactionEnv` trait was renamed to `TransactionEnvMut`; all
imports and impls updated
- **`TransactionEnv::nonce()` removed**: The `nonce(&self)` method was
removed from the trait; call sites now use
`revm::context::Transaction::nonce()` instead
- **`TxResult::into_result()` added**: New required method
`into_result(self) -> ResultAndState` added to the `TxResult` trait impl
for `TempoTxResult`
- **`BuildArguments::new` signature expanded**: Two new parameters
(`None, None`) added for `execution_cache` and `trie_handle`;
corresponding destructure updated to ignore them
- **`builder.finish()` signature change**: Now takes an additional
`Option` parameter (passed as `None`) for trie handle

[GitHub
Workflow](https://github.com/tempoxyz/tempo/actions/runs/23779717485)
github-merge-queue Bot pushed a commit to tempoxyz/tempo that referenced this pull request Apr 1, 2026
Automated nightly update of reth dependencies from `paradigmxyz/reth`
main branch.

## Upstream reth changes


[`7f4a9a0...f8efc76`](paradigmxyz/reth@7f4a9a0...f8efc76)

🔗 Amp thread:
https://ampcode.com/threads/T-019d473e-4c63-7539-bda6-d72354aae810
**Engine**
- Share execution cache and sparse trie pipeline with payload builder
([#23242](paradigmxyz/reth#23242),
[#23246](paradigmxyz/reth#23246))
- Add backpressure, take 2
([#23280](paradigmxyz/reth#23280))
- Add method to get payload resolve future
([#23256](paradigmxyz/reth#23256))
- Return -38003 for FCUv2 payloadAttributes mismatch
([#22924](paradigmxyz/reth#22924))
- Fix double decrement in account cache size
([#23249](paradigmxyz/reth#23249))

**Trie**
- Call root before prune
([#23243](paradigmxyz/reth#23243))
- Use Entry API in `MultiProofTargets::extend_inner`
([#23247](paradigmxyz/reth#23247))
- Record trie cursor metrics
([#23252](paradigmxyz/reth#23252))
- Add `SparseStateTrie::update_account_stateless` for stateless
validation ([#23272](paradigmxyz/reth#23272))

**RPC**
- Integrate `reth-rpc-traits` and remove `IntoRpcTx`
([#23288](paradigmxyz/reth#23288))
- Relax rpc converter impls
([#23254](paradigmxyz/reth#23254))

**Net**
- Prefer peer-reported block number in session activation
([#23275](paradigmxyz/reth#23275))
- Retry block subscription on initial connection failure
([#23233](paradigmxyz/reth#23233))
- Resolve DNS for ExternalAddr in `external_addr_with`
([#23269](paradigmxyz/reth#23269))

**Payload / Refactor**
- Remove OP `ExecutionPayload` impl and op feature from
payload-primitives
([#23253](paradigmxyz/reth#23253))
- Remove OP `PayloadAttributesBuilder` impl and op feature from
engine-local ([#23255](paradigmxyz/reth#23255))
- Remove changeset count APIs from storage
([#23310](paradigmxyz/reth#23310))

**DB / Storage**
- Add `create_test_provider_factory_with_chain_spec_and_db_args`
([#23270](paradigmxyz/reth#23270))
- Add `reth-bb` binary with multi-segment big block execution support
([#23140](paradigmxyz/reth#23140))
- Add era type override functionality to `EraClient`
([#23307](paradigmxyz/reth#23307))
- Make snapshot API URL overridable
([#23303](paradigmxyz/reth#23303))

**Perf**
- Use `FastInstant` for remaining metrics timing
([#23265](paradigmxyz/reth#23265))

**Bench / Testing**
- Add hourly main regression bench
([#23219](paradigmxyz/reth#23219))
- Add regression test for parked basefee ancestor handling
([#23277](paradigmxyz/reth#23277))

**CLI**
- Add more WARN logging before download retries
([#23258](paradigmxyz/reth#23258))
- Use `HeaderTy` for stage dump headers
([#23274](paradigmxyz/reth#23274))

**Deps**
- Bump alloy 1.8.2, alloy-evm, Lighthouse v8.1.3
([#23241](paradigmxyz/reth#23241),
[#23289](paradigmxyz/reth#23289),
[#23239](paradigmxyz/reth#23239))
- Weekly `cargo update`
([#23267](paradigmxyz/reth#23267))

**Grafana**
- Add sparse trie idle metrics to overview
([#23302](paradigmxyz/reth#23302))

## Migrations

🔗 Amp thread:
https://ampcode.com/threads/T-019d473e-8aba-7283-b729-057391d70bc6
- **Reth dependency bump**: All `reth-*` crates updated from rev
`7f4a9a0` to `f8efc76`
- **alloy-evm version bump**: `0.29.2` → `0.30.0`; `revm-inspectors`
`0.36.1` → `0.36.0`
- **`TransactionEnv` → `TransactionEnvMut`**: Trait moved from
`reth_evm::TransactionEnv` to `alloy_evm::TransactionEnvMut`; `nonce()`
getter removed from the trait (now accessed via
`revm::context::Transaction::nonce()` instead)
- **`FromConsensusHeader` re-export flattened**: Import path changed
from `reth_rpc_convert::transaction::FromConsensusHeader` to
`reth_rpc_convert::FromConsensusHeader`
- **`TxResult::into_result` added**: New required method on the
`TxResult` trait to consume and return `ResultAndState`
- **`BuildArguments::new` signature expanded**: Now takes two additional
`None` parameters (likely cached/prebuilt payload fields)
- **`BuildArguments` destructure**: Uses `..` rest pattern to ignore new
fields
- **`BlockBuilder::finish` signature change**: Now takes an additional
parameter (`None` — likely an optional requests list)
- **`snapshot_api_url` field added**: New required field on download URL
config struct

[GitHub
Workflow](https://github.com/tempoxyz/tempo/actions/runs/23831300439)

---------

Co-authored-by: Arsenii Kulikov <klkvrr@gmail.com>
Co-authored-by: Arsenii Kulikov <62447812+klkvr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants