feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture#6678
feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture#6678edg-l wants to merge 3 commits into
Conversation
|
Lines of code reportTotal lines added: Detailed view |
57bbd3c to
135f7e9
Compare
Adds five Prometheus instruments under `ethrex-metrics::blocks`: - `bal_blocks_total` (IntCounter): cumulative count of BAL-carrying blocks processed. - `bal_size_bytes` (Gauge) + `bal_size_bytes_histogram` (Histogram): RLP-encoded size of the most recent BAL and its distribution (exponential buckets 0, 1 KiB, 2 KiB, ..., 32 MiB). - `bal_account_count` (IntGauge): account count in the most recent BAL. - `bal_slot_count` (IntGauge): unique storage slots (writes + reads) in the most recent BAL; uses `BlockAccessList::item_count()` minus the per-address rows. Recorded from `add_block_pipeline_inner`. Source-of-truth at the recording site is `produced_bal.as_ref().or(bal)`: builder path's produced BAL when present, otherwise the validation path's header BAL. Pre-Amsterdam blocks (no BAL) skip recording. The computation lives inside `metrics!()` so it compiles out entirely on builds without the `metrics` feature. Aligns with the cross-client BAL OTel spec (https://notes.ethereum.org/@ethpandaops/bal-otel) for the metrics subset; OTel spans are a separate follow-up.
Adds a "BAL (EIP-7928)" collapsed row to ethrex_l1_perf.json with five panels driven by the new bal_* Prometheus instruments: - BAL Blocks Rate: rate(bal_blocks_total) - BAL Size (bytes): bal_size_bytes - BAL Size Distribution: heatmap on bal_size_bytes_histogram_bucket - BAL Account Count: bal_account_count - BAL Slot Count: bal_slot_count Placed after the "Block Execution Breakdown" row. The ethereum-package copy of the dashboard is regenerated by `make localnet` from this canonical file, so only the canonical copy under metrics/provisioning/grafana/dashboards/common_dashboards/ is tracked.
135f7e9 to
4072679
Compare
Previous panel gridPos used y=0/y=8 which conflicted with the row header at y=26, causing Grafana to auto-reflow the panels into a jagged layout. Use absolute y=27 / y=35 (matching the row's own y), and switch to a clean 4-up timeseries strip (Blocks Rate, Size, Account Count, Slot Count) with the Size Distribution heatmap as a full-width panel underneath.
🤖 Kimi Code ReviewThe changes look correct and well-implemented. The BAL (EIP-7928) metrics are properly typed, registered, and integrated into the existing observability stack. Minor observations:
Security/Performance: No issues. Metrics collection is read-only, occurs once per block, and uses atomic operations internally. The implementation follows existing patterns in Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewNow I have enough context for a thorough review. PR #6678 —
|
🤖 Codex Code Review
No EVM opcode / gas-accounting / trie / RLP correctness issues jumped out in the Rust changes themselves; the concerns here are about metrics correctness and coverage. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR instruments the EIP-7928 Block Access List (BAL) path with five Prometheus metrics, wires them into the block pipeline, and adds a "BAL (EIP-7928)" collapsed row with five Grafana panels to
Confidence Score: 4/5Safe to merge; the change adds observability-only instrumentation behind the existing metrics macro guard with no effect on block validation or consensus logic. All three files introduce metrics and dashboard configuration with no execution-critical code paths altered. The slot_count formula is consistent with the EIP-7928 BAL uniqueness invariant and the produced_bal fallback is a correct Option combinator. Both findings are purely stylistic and do not affect correctness or runtime behavior. No files require special attention; all changes are additive metrics instrumentation.
|
| Filename | Overview |
|---|---|
| crates/blockchain/metrics/blocks.rs | Adds five BAL metrics (IntCounter, Gauge, Histogram, two IntGauges) with correct registration and accessor methods; the 0.0 histogram bucket is dead code since BALs are always > 0 bytes when observed. |
| crates/blockchain/blockchain.rs | Instruments the block pipeline with BAL metrics using produced_bal.as_ref().or(bal) to prefer locally-executed BAL; slot count derived from item_count() minus account_count is consistent with EIP-7928 uniqueness guarantee. |
| metrics/provisioning/grafana/dashboards/common_dashboards/ethrex_l1_perf.json | Adds 5 Grafana panels in a collapsed BAL (EIP-7928) row with timeseries and heatmap panels; PromQL queries, units, and heatmap format are all correct. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[add_block_pipeline_inner] --> B[execute_block_pipeline]
B --> C{produced_bal present?}
C -- Some --> D[Use produced_bal]
C -- None --> E[Use bal param fallback]
D --> F[Merge via .or]
E --> F
F --> G{BAL available?}
G -- Yes --> H[Compute metrics]
H --> H1[account_count from accounts len]
H --> H2[slot_count from item_count minus account_count]
H --> H3[size_bytes from RLP length]
H1 & H2 & H3 --> I[METRICS_BLOCKS update]
I --> I1[inc_bal_blocks_total IntCounter]
I --> I2[set_bal_size_bytes Gauge plus Histogram]
I --> I3[set_bal_account_count IntGauge]
I --> I4[set_bal_slot_count IntGauge]
G -- No --> J[Skip BAL metrics]
I1 & I2 & I3 & I4 --> K[gather_metrics Prometheus scrape]
K --> L[Grafana BAL EIP-7928 row]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/blockchain/metrics/blocks.rs:186-192
The `0.0` bucket prepended to the histogram will never be populated: `set_bal_size_bytes` is only called inside the `if let Some(bal) = …` branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The `le=0` bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.
```suggestion
.buckets(
prometheus::exponential_buckets(1024.0, 2.0, 16)
.expect("Invalid bucket params"),
),
```
### Issue 2 of 2
crates/blockchain/blockchain.rs:1967-1975
**Variable shadowing of `bal` parameter**
The `if let Some(bal) = produced_bal.as_ref().or(bal)` pattern silently shadows the outer `bal: Option<&BlockAccessList>` parameter with a `&BlockAccessList` binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as `bal_ref` for the pattern binding would make the intent clearer at no cost.
Reviews (1): Last reviewed commit: "fix(l1): rework BAL dashboard panel layo..." | Re-trigger Greptile
| .buckets({ | ||
| let mut buckets = | ||
| prometheus::exponential_buckets(1024.0, 2.0, 16) | ||
| .expect("Invalid bucket params"); | ||
| buckets.insert(0, 0.0); | ||
| buckets | ||
| }), |
There was a problem hiding this comment.
The
0.0 bucket prepended to the histogram will never be populated: set_bal_size_bytes is only called inside the if let Some(bal) = … branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The le=0 bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.
| .buckets({ | |
| let mut buckets = | |
| prometheus::exponential_buckets(1024.0, 2.0, 16) | |
| .expect("Invalid bucket params"); | |
| buckets.insert(0, 0.0); | |
| buckets | |
| }), | |
| .buckets( | |
| prometheus::exponential_buckets(1024.0, 2.0, 16) | |
| .expect("Invalid bucket params"), | |
| ), |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/metrics/blocks.rs
Line: 186-192
Comment:
The `0.0` bucket prepended to the histogram will never be populated: `set_bal_size_bytes` is only called inside the `if let Some(bal) = …` branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The `le=0` bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.
```suggestion
.buckets(
prometheus::exponential_buckets(1024.0, 2.0, 16)
.expect("Invalid bucket params"),
),
```
How can I resolve this? If you propose a fix, please make it concise.| metrics!(if let Some(bal) = produced_bal.as_ref().or(bal) { | ||
| let account_count = bal.accounts().len() as u64; | ||
| let slot_count = bal.item_count().saturating_sub(account_count); | ||
| let size_bytes = bal.length() as f64; | ||
| METRICS_BLOCKS.inc_bal_blocks_total(); | ||
| METRICS_BLOCKS.set_bal_size_bytes(size_bytes); | ||
| METRICS_BLOCKS.set_bal_account_count(account_count as i64); | ||
| METRICS_BLOCKS.set_bal_slot_count(slot_count as i64); | ||
| }); |
There was a problem hiding this comment.
Variable shadowing of
bal parameter
The if let Some(bal) = produced_bal.as_ref().or(bal) pattern silently shadows the outer bal: Option<&BlockAccessList> parameter with a &BlockAccessList binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as bal_ref for the pattern binding would make the intent clearer at no cost.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1967-1975
Comment:
**Variable shadowing of `bal` parameter**
The `if let Some(bal) = produced_bal.as_ref().or(bal)` pattern silently shadows the outer `bal: Option<&BlockAccessList>` parameter with a `&BlockAccessList` binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as `bal_ref` for the pattern binding would make the intent clearer at no cost.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
ElFantasma
left a comment
There was a problem hiding this comment.
NOTE 1: this PR is adding to the divergent side of an existing split in how Prometheus is wired — same shape as the concern I flagged in #6466 (review body).
There are two patterns in the codebase today:
-
register_*macro / default-registry pattern (2 modules):rpc.rs(introduced in #5335, 2025-11-24) andsync.rs(#6470, 2026-04-20). Each metric is created withregister_int_gauge!/register_counter_vec!/ etc., which registers it into Prometheus's default registry at construction time.api.rssurfaces them automatically viagather_default_metrics(). -
Manual
Registry::new()pattern (6 modules):blocks.rs(this one),node.rs,p2p.rs,process.rs,transactions.rs,l2/metrics.rs. Each module owns a singleton struct of bare collectors, exposes agather_metrics()method that builds a freshRegistry::new()per call, andapi.rs:40has to know about each module and concatenate their output.
We should be moving towards the first pattern; that's tech debt to be addressed in some future effort. But this PR adds 5 new instruments from scratch — a free moment to land them in the better shape rather than grow blocks.rs's divergent surface by 5 entries. Concrete deltas vs the current diff:
-
Move BAL metrics to a sibling module
crates/blockchain/metrics/bal.rs(mirroringsync.rs). Useregister_int_counter!("ethrex_bal_blocks_total", ...),register_gauge!("ethrex_bal_size_bytes", ...),register_histogram_with_buckets!("ethrex_bal_size_bytes", ...),register_int_gauge!("ethrex_bal_account_count", ...),register_int_gauge!("ethrex_bal_slot_count", ...). Each macro registers into the default Prometheus registry at construction time. New metrics auto-surface viagather_default_metrics()with zeroapi.rschange. -
Naming becomes
ethrex_bal_*(with theethrex_prefix), matching theethrex_sync_*/ethrex_rpc_*convention used in the macro-pattern modules. The barebal_*names the current diff uses match the cross-client OTel BAL spec literally, but most cross-client dashboards query with regex (.*bal_.*) — confirming this against the spec text before merge would lock in the right name. If the spec mandates literalbal_*with no client-prefix, the macro pattern doesn't fit and the current shape stays; if it allows a<client>_prefix, the macro pattern is the better landing spot. -
Drop the
set_bal_*/inc_bal_blocks_totalmethods onMetricsBlocks. With the new module's instruments beingpub static LazyLock<...>, the call sites inblockchain.rsbecomeMETRICS_BAL.size_bytes.set(size)/METRICS_BAL.blocks_total.inc()— direct collector access, no wrapper methods. -
Drop
api.rs's explicitMETRICS_BLOCKS.gather_metrics()call for the BAL slice. Once the metrics live in the default registry via macros,api.rsdoesn't need to know they exist. Mirrors the comment atapi.rs:57-58from #6470: "METRICS_SYNC registers into the default Prometheus registry at init, so its metrics are already included ingather_default_metrics()above."
The 5 instruments map 1:1 to the current diff — same types, same descriptions, same produced_bal.as_ref().or(bal) call site in blockchain.rs. Mechanical refactor; the only behavioral change is the metric names gaining the ethrex_ prefix (point 2 above).
Note 2: test plan + CHANGELOG gaps. make localnet against an Amsterdam-active config, confirm the BAL row populates post-fork is unchecked — that's the validation that proves the dashboard actually shows data. And no CHANGELOG entry for an observability addition; a one-liner under "Observability" would help operators find it later.
The two inline observations below stand independently of the manual-vs-macro choice above — they apply either way.
| "Distribution of RLP-encoded Block Access List sizes in bytes", | ||
| ) | ||
| .buckets({ | ||
| let mut buckets = |
There was a problem hiding this comment.
Histogram bucket choice. exponential_buckets(1024, 2.0, 16) produces buckets at 1 KiB, 2 KiB, 4 KiB, …, 32 MiB (16 buckets), then buckets.insert(0, 0.0) prepends a zero floor. The implementation is fine and the comment cites the right range, but the literal 1024.0, 2.0, 16 is easy to misread; future-you wondering "is the top bucket 16 MiB or 32 MiB?" has to do the arithmetic.
Cheap improvement: a named local or constants:
const BAL_HISTOGRAM_BASE: f64 = 1024.0; // 1 KiB
const BAL_HISTOGRAM_BUCKETS: usize = 16; // up to ~32 MiB at base 2or inline-comment the buckets list:
.buckets({
// 0, 1 KiB, 2 KiB, 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB, 128 KiB,
// 256 KiB, 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, 16 MiB, 32 MiB.
let mut buckets = prometheus::exponential_buckets(1024.0, 2.0, 16)Low priority; just readability.
| } | ||
|
|
||
| metrics!(if let Some(bal) = produced_bal.as_ref().or(bal) { | ||
| let account_count = bal.accounts().len() as u64; |
There was a problem hiding this comment.
nit: bal_size_bytes / bal_account_count / bal_slot_count are per-block-snapshot Gauges (replaced on every new BAL block). The existing convention at lines ~1997-2000 in the same file uses latest_* for the same shape (set_latest_gas_used, set_latest_block_gas_limit, set_latest_gigagas).
If you stick with the manual-Registry pattern in blocks.rs (i.e. don't take the body's Body Finding 1 suggestion), worth at least a one-line doc-comment on each field declaration noting it's a snapshot Gauge (not an instantaneous value) so future readers don't expect monotonic semantics:
/// RLP-encoded size of the most recent BAL, in bytes (per-block snapshot).
bal_size_bytes: Gauge,If you take the body's suggestion and move to a new bal.rs module using the macro pattern, the naming question shifts to whether the cross-client OTel BAL spec requires literal bal_* or allows an <client>_bal_* prefix — see the body. Non-blocking either way.
Summary
Adds Prometheus instruments for EIP-7928 Block Access Lists and a "BAL (EIP-7928)" row in
ethrex_l1_perf.jsonto surface them. Aligns with the metrics subset of the cross-client BAL OTel spec (https://notes.ethereum.org/@ethpandaops/bal-otel); OTel span tracing is a separate follow-up.Instruments
bal_blocks_totalbal_size_bytes+bal_size_bytes_histogrambal_account_countbal_slot_countBlockAccessList::item_count()minus per-address rowsDashboard panels
rate(bal_blocks_total[$__rate_interval])bal_size_bytessum by (le)(rate(bal_size_bytes_histogram_bucket[$__rate_interval]))bal_account_countbal_slot_countThe
ethereum-package/src/grafana/ethrex_l1_perf.jsoncopy is regenerated bymake localnetfrom the canonical file.Test plan
cargo check --bin ethrexcargo check -p ethrex-blockchain --no-default-featurescargo fmt --all --checkcargo clippy --workspace --no-deps -- -D warningsmake localnetagainst an Amsterdam-active config, confirm the BAL row populates post-fork