Skip to content

feat(engine): slow block logs#21433

Merged
mattsse merged 57 commits intomainfrom
feature/execution-metrics-standardization
Mar 6, 2026
Merged

feat(engine): slow block logs#21433
mattsse merged 57 commits intomainfrom
feature/execution-metrics-standardization

Conversation

@shekhirin
Copy link
Copy Markdown
Member

@shekhirin shekhirin commented Jan 26, 2026

Ref: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ, https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850

Reference implementations in other clients: NethermindEth/nethermind#10288, ethereum/go-ethereum#33655

Adds configurable slow block logging via --engine.slow-block-threshold that emits a structured warn! log with detailed timing, state-operation counts, and cache hit-rate metrics for blocks whose total processing time exceeds the threshold.

When the threshold is configured, block validation instruments the state provider to collect fetch counts and latencies, and tracks cache hits/misses. After persistence commits, blocks exceeding the threshold emit a ConsensusEngineEvent::SlowBlock carrying an ExecutionTimingStats snapshot. The node events handler formats this into a single structured log line with execution time, state root time, commit time, mgas/s, account/storage/code fetch counts, and cache hit rates.

When no threshold is set, no stats are collected and no additional allocations occur.

Key types introduced:

  • ExecutionTimingStats (reth-chain-state): 24-field struct capturing block number/hash, execution/state-root/commit durations, state operation counts, and cache metrics
  • SlowBlockInfo (reth-engine-primitives): wrapper carrying stats + commit duration + total duration
  • CacheStats (engine-tree): atomic counters for cache hit/miss tracking
  • StateProviderStats (engine-tree): atomic counters for state provider fetch counts and latencies
  • ValidationOutcome (engine-tree): result type alias for block validation with optional timing stats

Also refactors PersistenceResult to carry commit duration alongside the last persisted block, replacing the separate SaveBlocksResult type.

@github-project-automation github-project-automation Bot moved this to Backlog in Reth Tracker Jan 26, 2026
@shekhirin shekhirin force-pushed the feature/execution-metrics-standardization branch 9 times, most recently from 3edfd77 to 57b9e54 Compare January 27, 2026 15:50
@shekhirin shekhirin added C-enhancement New feature or request A-observability Related to tracing, metrics, logs and other observability tools labels Jan 27, 2026
@shekhirin shekhirin marked this pull request as ready for review January 27, 2026 18:01
Copy link
Copy Markdown
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a lot of stuff -.-

but I guess all of these make sense, I'd like to avoid having this leak as part of the block type

also have some suggestions for how to avoid adding this to the persistence task as well.

}
}

/// A handle to [`InstrumentedStateProvider`] that can be used to fetch recorded metrics.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find handle here a bit confusing, isnt this now just a container for metrics?

Comment thread crates/node/core/src/args/debug.rs Outdated
fn from_str(s: &str) -> Result<Self, Self::Err> {
if s.is_empty() {
return Ok(Self(Default::default()))
return Ok(Self(Default::default()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated

let Self { builder, task_executor } = self;

let engine_tree_config = builder.config.engine.tree_config();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated

Comment thread crates/ethereum/evm/Cargo.toml Outdated
Comment on lines +37 to +45
reth-evm = { workspace = true, features = ["test-utils", "metrics"] }
reth-revm.workspace = true
reth-execution-types.workspace = true
secp256k1.workspace = true
alloy-genesis.workspace = true
revm-bytecode.workspace = true
revm-database.workspace = true
revm-primitives.workspace = true
tracing-subscriber = { workspace = true, features = ["env-filter"] }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need all of this here now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable testing. I needed all of the dev depencencies in order to be able to run the tests.

Comment on lines +106 to +126
/// Trie context needed for [`BasicEngineValidator::spawn_deferred_trie_task`]
#[derive(Debug)]
struct TrieCtx<P> {
hashed_state: Arc<HashedPostState>,
trie_output: Arc<TrieUpdates>,
overlay_factory: OverlayStateProviderFactory<P>,
}

impl<P> TrieCtx<P> {
fn new(
hashed_state: HashedPostState,
trie_output: TrieUpdates,
overlay_factory: OverlayStateProviderFactory<P>,
) -> Self {
Self {
hashed_state: Arc::new(hashed_state),
trie_output: Arc::new(trie_output),
overlay_factory,
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im losing it with amp, every single type it puts new types at the wrong place, this does not belong above the main type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't in my original PR. @shekhirin I assume is your addition?

Comment thread crates/engine/tree/src/persistence.rs Outdated
Comment on lines +189 to +202
// Emit unified slow block logs after commit completes
for stats in timing_stats {
let total_duration = stats.execution_duration +
stats.state_read_duration +
stats.state_hash_duration +
commit_duration;

// Check if total time exceeds threshold
if let Some(threshold) = self.slow_block_threshold &&
total_duration > threshold
{
self.log_slow_block(stats, commit_duration, total_duration);
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract this to a helper fn perhaps?

Comment thread crates/engine/tree/src/persistence.rs Outdated
self.metrics.save_blocks_duration_seconds.record(total_commit_duration);

// Emit unified slow block logs after commit completes
for stats in timing_stats {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop is a noop if no slow_block_threshold is set right?

Comment thread crates/chain-state/src/in_memory.rs Outdated
Comment thread crates/engine/tree/src/persistence.rs Outdated
Comment thread crates/engine/tree/src/persistence.rs Outdated
@github-project-automation github-project-automation Bot moved this from Backlog to In Progress in Reth Tracker Jan 27, 2026
Comment thread crates/chain-state/src/execution_stats.rs Outdated
Comment thread crates/chain-state/src/execution_stats.rs
Implements execution metrics following the cross-client specification:
https://github.com/ethereum/execution-specs/blob/main/docs/execution-metrics-spec.md

- Add slow block logging method to ExecutorMetrics
- Output structured JSON log entries for cross-client analysis
- Include timing, throughput, and state access counts
- Threshold configurable via SLOW_BLOCK_THRESHOLD_MS constant
Implement standardized JSON format for slow block logging to enable
cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative
proposed by Gary Rong: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ

The standardized metrics enabled data-driven analysis like the EIP-7907
research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850

JSON format includes:
- block: number, hash, gas_used, tx_count
- timing: execution_ms, total_ms
- throughput: mgas_per_sec
- state_reads: accounts, storage_slots, code, code_bytes
- state_writes: accounts, storage_slots
- cache: account/storage/code hits, misses, hit_rate
Convert timing parameters from u64 to f64 to preserve sub-millisecond
precision in slow block JSON output.

Changes:
- log_slow_block() timing params: u64 -> f64
- is_slow_block(): u64 -> f64 with threshold cast
- Tracing format: add {:.3} for 3 decimal places
- Update tests for f64 values
mattsse and others added 7 commits March 5, 2026 10:01
Remove CacheStats from CachedStateMetrics and pass it as an optional
Arc<CacheStats> to CachedStateProvider only when slow_block_threshold is
configured. This avoids unnecessary atomic increments on every cache
hit/miss when slow block logging is disabled.

Replace InstrumentedStateProviderHandle/InstrumentedStateProviderHandleInner
with a flat StateProviderStats struct shared via Arc. The provider now
holds metrics and stats directly, removing a layer of indirection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattsse mattsse force-pushed the feature/execution-metrics-standardization branch from a95a657 to 82dd69a Compare March 5, 2026 09:34
…r for cache_stats

Use with_cache_stats() builder method instead of adding a parameter to
the constructor. Also fix pre-existing const fn breakage on nightly for
ExecutionCache::new and SavedCache::new.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattsse mattsse force-pushed the feature/execution-metrics-standardization branch from 067a938 to 41ab9d1 Compare March 5, 2026 10:26
mattsse and others added 6 commits March 5, 2026 11:45
Eliminates PersistenceRx enum and SaveBlocksResult by using a single
PersistenceResult type for all persistence operations, simplifying
wait_for_event from duplicated per-variant select blocks to a single
unified select_biased block.

Fixes total_duration double-counting (state_read_duration is a subset
of execution_duration). Removes premature purge_timing_stats call in
on_new_persisted_block that was clearing stats before the threshold
check could run. Fixes divide-by-zero in mgas/s calculation. Corrects
misleading doc comments on Duration fields in ExecutionTimingStats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ock logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread crates/engine/tree/src/persistence.rs
mattsse and others added 4 commits March 5, 2026 13:20
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When CacheStats is present (slow block threshold configured), record
only to CacheStats during execution and flush to prometheus in a
single batch at the end, avoiding double recording on every cache
access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shekhirin shekhirin force-pushed the feature/execution-metrics-standardization branch from 25a187e to 25c62b0 Compare March 6, 2026 12:46
Resolved conflicts in config.rs and engine.rs where sparse_trie_max_storage_tries
was replaced by sparse_trie_max_hot_slots + sparse_trie_max_hot_accounts on main.
Kept both main's LFU pruning renames and the PR's slow_block_threshold additions.

Amp-Thread-ID: https://ampcode.com/threads/T-019cc34f-c511-71b7-b6bd-ad8600e06a51
Co-authored-by: Amp <amp@ampcode.com>
@mattsse mattsse enabled auto-merge March 6, 2026 13:38
@mattsse mattsse added this pull request to the merge queue Mar 6, 2026
Merged via the queue into main with commit 99aea38 Mar 6, 2026
39 checks passed
@mattsse mattsse deleted the feature/execution-metrics-standardization branch March 6, 2026 13:56
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Reth Tracker Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-observability Related to tracing, metrics, logs and other observability tools C-enhancement New feature or request M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants