Skip to content

feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture#6678

Open
edg-l wants to merge 3 commits into
mainfrom
feat/bal-prometheus-metrics
Open

feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture#6678
edg-l wants to merge 3 commits into
mainfrom
feat/bal-prometheus-metrics

Conversation

@edg-l
Copy link
Copy Markdown
Contributor

@edg-l edg-l commented May 19, 2026

Summary

Adds Prometheus instruments for EIP-7928 Block Access Lists and a "BAL (EIP-7928)" row in ethrex_l1_perf.json to surface them. Aligns with the metrics subset of the cross-client BAL OTel spec (https://notes.ethereum.org/@ethpandaops/bal-otel); OTel span tracing is a separate follow-up.

Instruments

name type description
bal_blocks_total IntCounter cumulative BAL-carrying blocks processed
bal_size_bytes + bal_size_bytes_histogram Gauge + Histogram RLP-encoded BAL size and distribution (exponential buckets 0, 1 KiB, 2 KiB, ..., 32 MiB)
bal_account_count IntGauge account count in the latest BAL
bal_slot_count IntGauge unique storage slots (writes + reads), via BlockAccessList::item_count() minus per-address rows

Dashboard panels

panel type promql
BAL Blocks Rate timeseries rate(bal_blocks_total[$__rate_interval])
BAL Size (bytes) timeseries bal_size_bytes
BAL Size Distribution heatmap sum by (le)(rate(bal_size_bytes_histogram_bucket[$__rate_interval]))
BAL Account Count timeseries bal_account_count
BAL Slot Count timeseries bal_slot_count

The ethereum-package/src/grafana/ethrex_l1_perf.json copy is regenerated by make localnet from the canonical file.

Test plan

  • cargo check --bin ethrex
  • cargo check -p ethrex-blockchain --no-default-features
  • cargo fmt --all --check
  • cargo clippy --workspace --no-deps -- -D warnings
  • make localnet against an Amsterdam-active config, confirm the BAL row populates post-fork

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@edg-l edg-l changed the title feat(metrics): BAL Prometheus instruments + dashboard panels + localnet fixture feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture May 19, 2026
@github-actions github-actions Bot added the L1 Ethereum client label May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Lines of code report

Total lines added: 71
Total lines removed: 0
Total lines changed: 71

Detailed view
+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs     | 2555  | +9   |
+--------------------------------------------+-------+------+
| ethrex/crates/blockchain/metrics/blocks.rs | 313   | +62  |
+--------------------------------------------+-------+------+

@edg-l edg-l force-pushed the feat/bal-prometheus-metrics branch from 57bbd3c to 135f7e9 Compare May 19, 2026 11:32
edg-l added 2 commits May 19, 2026 13:46
Adds five Prometheus instruments under `ethrex-metrics::blocks`:

- `bal_blocks_total` (IntCounter): cumulative count of BAL-carrying
  blocks processed.
- `bal_size_bytes` (Gauge) + `bal_size_bytes_histogram` (Histogram):
  RLP-encoded size of the most recent BAL and its distribution
  (exponential buckets 0, 1 KiB, 2 KiB, ..., 32 MiB).
- `bal_account_count` (IntGauge): account count in the most recent BAL.
- `bal_slot_count` (IntGauge): unique storage slots (writes + reads)
  in the most recent BAL; uses `BlockAccessList::item_count()` minus
  the per-address rows.

Recorded from `add_block_pipeline_inner`. Source-of-truth at the
recording site is `produced_bal.as_ref().or(bal)`: builder path's
produced BAL when present, otherwise the validation path's header BAL.
Pre-Amsterdam blocks (no BAL) skip recording.

The computation lives inside `metrics!()` so it compiles out entirely
on builds without the `metrics` feature.

Aligns with the cross-client BAL OTel spec
(https://notes.ethereum.org/@ethpandaops/bal-otel) for the metrics
subset; OTel spans are a separate follow-up.
Adds a "BAL (EIP-7928)" collapsed row to ethrex_l1_perf.json with five
panels driven by the new bal_* Prometheus instruments:

- BAL Blocks Rate: rate(bal_blocks_total)
- BAL Size (bytes): bal_size_bytes
- BAL Size Distribution: heatmap on bal_size_bytes_histogram_bucket
- BAL Account Count: bal_account_count
- BAL Slot Count: bal_slot_count

Placed after the "Block Execution Breakdown" row. The
ethereum-package copy of the dashboard is regenerated by `make localnet`
from this canonical file, so only the canonical copy under
metrics/provisioning/grafana/dashboards/common_dashboards/ is tracked.
@edg-l edg-l force-pushed the feat/bal-prometheus-metrics branch from 135f7e9 to 4072679 Compare May 19, 2026 11:47
@edg-l edg-l moved this to In Progress in ethrex_l1 May 19, 2026
Previous panel gridPos used y=0/y=8 which conflicted with the row
header at y=26, causing Grafana to auto-reflow the panels into a
jagged layout. Use absolute y=27 / y=35 (matching the row's own y),
and switch to a clean 4-up timeseries strip (Blocks Rate, Size,
Account Count, Slot Count) with the Size Distribution heatmap as a
full-width panel underneath.
@edg-l edg-l marked this pull request as ready for review May 19, 2026 12:29
@edg-l edg-l requested a review from a team as a code owner May 19, 2026 12:29
@ethrex-project-sync ethrex-project-sync Bot moved this from In Progress to In Review in ethrex_l1 May 19, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

The changes look correct and well-implemented. The BAL (EIP-7928) metrics are properly typed, registered, and integrated into the existing observability stack.

Minor observations:

  1. Slot count calculation assumption (blockchain.rs:1967): The logic bal.item_count().saturating_sub(account_count) assumes item_count() returns the total number of RLP items (accounts + slots) in a flat structure. Verify this matches the actual BlockAccessList encoding format—if the structure is nested (e.g., each account contains its slot list), this calculation will be incorrect.

  2. Floating point precision (blockchain.rs:1969): Casting bal.length() to f64 is safe in practice (precise up to 9PB), but document this assumption if BAL sizes could theoretically exceed 2^53 bytes.

  3. Dashboard JSON (ethrex_l1_perf.json:4716): The unicode escape \u2014 (em-dash) is correctly used, ensuring consistent character encoding.

Security/Performance: No issues. Metrics collection is read-only, occurs once per block, and uses atomic operations internally.

The implementation follows existing patterns in MetricsBlocks and correctly uses IntCounter for cumulative counts and IntGauge/Gauge for point-in-time values.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context for a thorough review.


PR #6678feat(l1): bal prometheus instruments, dashboard panels, and localnet fixture

Overview

This PR instruments EIP-7928 Block Access List (BAL) processing with five Prometheus metrics (bal_blocks_total, bal_size_bytes, bal_size_bytes_histogram, bal_account_count, bal_slot_count) and adds a corresponding "BAL (EIP-7928)" row to the Grafana dashboard. The implementation is straightforward and well-scoped.


Correctness

slot_count calculation — correct but relies on an implicit spec invariant

blockchain.rs:1969

let slot_count = bal.item_count().saturating_sub(account_count);

item_count() counts 1 (address) + storage_reads.len() + storage_changes.len() per account. Subtracting account_count yields the sum of reads + writes. The metric is labelled "unique storage slots", which is only accurate if storage_reads and storage_changes are disjoint per the EIP-7928 spec (a slot written to should not appear in storage_reads). This appears to be a spec invariant, but it is not enforced locally. A brief comment to that effect would help future maintainers. No action needed unless you want the defence-in-depth.

produced_bal.as_ref().or(bal) — correct and intentional

When both a produced BAL and incoming BAL are None, the metrics! block is skipped entirely, so inc_bal_blocks_total() is never called spuriously.


Performance

bal.length() runs a full RLP encoding traversal every block (blocks.rs:1970)

RLPEncode::length() works by calling encode() against a byte-counter — it's a full O(n) structural walk. Combined with item_count() (a separate O(n) pass), BAL metrics collection performs two complete traversals of the BAL tree per block when metrics are enabled. For a large BAL approaching the spec's size limit, this is non-trivial. Since both calls are inside the metrics!() macro guard this only applies to enabled metric builds, but consider caching or combining them if profiling reveals cost:

// single-pass alternative if item_count and size are needed together
let (account_count, slot_count) = bal.accounts().iter().fold((0u64, 0u64), |(a, s), acc| {
    (a + 1, s + acc.storage_reads.len() as u64 + acc.storage_changes.len() as u64)
});

(Low priority; purely advisory.)


Prometheus / Metrics Issues

0.0 bucket is dead weight

blocks.rs:189–192

buckets.insert(0, 0.0);

In a Prometheus histogram, le=0 only matches observations of exactly 0.0. Since the code path only runs when Some(bal) is matched and an existing BAL will always encode to at least a few bytes, this bucket will never receive an observation. Remove it to reduce noise in TSDB and dashboard queries.

_histogram suffix in metric name is non-standard

The metric bal_size_bytes_histogram embeds the type in the name. Prometheus convention omits the type (the _bucket, _sum, _count suffixes make the type clear). The suffix was chosen to avoid collision with the bal_size_bytes Gauge, which is reasonable, but a cleaner pair would be:

  • bal_rlp_size_bytes (Gauge — current value)
  • bal_size_bytes (Histogram — distribution)

This isn't a blocker, but the current naming may confuse users querying the TSDB.


Grafana Dashboard

Stale default thresholds in all four timeseries panels

Each timeseries panel retains the Grafana default "color": "red", "value": 80 threshold. For byte and count metrics this threshold is meaningless (80 bytes? 80 accounts?) and will colour perfectly normal metric values red. Remove the red step or set a domain-appropriate value.

Affected panel IDs: 301 (BAL Blocks Rate), 302 (BAL Size (bytes)), 304 (BAL Account Count), 305 (BAL Slot Count).

Example fix — remove the red step from each panel's thresholds.steps:

"steps": [
  { "color": "green", "value": 0 }
]

Panel ID ordering is non-sequential in JSON

IDs go 301 → 302 → 304 → 305 → 303. Not a functional issue, but it makes manual JSON navigation slightly harder. Low priority.


Code Quality

  • The double cast account_count as u64account_count as i64 (lines 1968/1973) is safe and unremarkable.
  • All five expect() calls in Default for metric construction are appropriate — a panic at startup from a bad bucket config is the right behaviour.
  • Docstrings on the new struct fields and public methods are clear and consistent with the rest of the file.
  • The ethereum-package JSON sync note in the PR description is good housekeeping.

Summary

Two items are worth addressing before merge:

  • Remove the 0.0 histogram bucket — it's dead and pollutes the TSDB.
  • Clean up the value: 80 red thresholds in the four timeseries Grafana panels.

The Rust logic is correct, the metric names are acceptable (if slightly non-standard for the histogram), and the overall implementation aligns with the BAL OTel spec referenced in the PR. No security or consensus concerns.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. crates/blockchain/blockchain.rs:1943-1974 updates the new BAL metrics before store_block(...) is known to have succeeded. If the block later fails validate_state_root or store_block_updates, bal_blocks_total and the “latest BAL” gauges will still reflect an invalid/unpersisted block. That makes the dashboard lie on exactly the consensus/storage failures operators care about. I’d move the BAL metric updates behind a successful result check, or update them only after result?.

  2. crates/blockchain/blockchain.rs:1172-1189 and crates/blockchain/blockchain.rs:2267-2341 leave the new BAL metrics blind in the batch import path. execute_block_from_state() computes and validates a BAL, but returns only BlockExecutionResult, and add_blocks_in_batch() never increments bal_blocks_total or updates the BAL gauges/histogram. Since full sync prefers batch mode, the new Grafana row will undercount or stay stale for the common import path. If these metrics are meant to describe “processed” BAL blocks, they need to be plumbed through batch execution as well.

No EVM opcode / gas-accounting / trie / RLP correctness issues jumped out in the Rust changes themselves; the concerns here are about metrics correctness and coverage.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR instruments the EIP-7928 Block Access List (BAL) path with five Prometheus metrics, wires them into the block pipeline, and adds a "BAL (EIP-7928)" collapsed row with five Grafana panels to ethrex_l1_perf.json.

  • crates/blockchain/metrics/blocks.rs: introduces bal_blocks_total (IntCounter), bal_size_bytes (Gauge + Histogram), bal_account_count (IntGauge), and bal_slot_count (IntGauge); all are registered into the per-scrape registry following the established pattern.
  • crates/blockchain/blockchain.rs: captures BAL metrics after the block pipeline completes, preferring the locally-produced BAL (produced_bal) over the externally-supplied one (bal) via .or(); the slot_count derivation (item_count() − account_count) is consistent with item_count()'s own documentation guaranteeing uniqueness.
  • ethrex_l1_perf.json: adds five correctly-wired panels (timeseries for rate/size/accounts/slots, full-width heatmap for size distribution) with proper Prometheus label filtering.

Confidence Score: 4/5

Safe to merge; the change adds observability-only instrumentation behind the existing metrics macro guard with no effect on block validation or consensus logic.

All three files introduce metrics and dashboard configuration with no execution-critical code paths altered. The slot_count formula is consistent with the EIP-7928 BAL uniqueness invariant and the produced_bal fallback is a correct Option combinator. Both findings are purely stylistic and do not affect correctness or runtime behavior.

No files require special attention; all changes are additive metrics instrumentation.

Important Files Changed

Filename Overview
crates/blockchain/metrics/blocks.rs Adds five BAL metrics (IntCounter, Gauge, Histogram, two IntGauges) with correct registration and accessor methods; the 0.0 histogram bucket is dead code since BALs are always > 0 bytes when observed.
crates/blockchain/blockchain.rs Instruments the block pipeline with BAL metrics using produced_bal.as_ref().or(bal) to prefer locally-executed BAL; slot count derived from item_count() minus account_count is consistent with EIP-7928 uniqueness guarantee.
metrics/provisioning/grafana/dashboards/common_dashboards/ethrex_l1_perf.json Adds 5 Grafana panels in a collapsed BAL (EIP-7928) row with timeseries and heatmap panels; PromQL queries, units, and heatmap format are all correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[add_block_pipeline_inner] --> B[execute_block_pipeline]
    B --> C{produced_bal present?}
    C -- Some --> D[Use produced_bal]
    C -- None --> E[Use bal param fallback]
    D --> F[Merge via .or]
    E --> F
    F --> G{BAL available?}
    G -- Yes --> H[Compute metrics]
    H --> H1[account_count from accounts len]
    H --> H2[slot_count from item_count minus account_count]
    H --> H3[size_bytes from RLP length]
    H1 & H2 & H3 --> I[METRICS_BLOCKS update]
    I --> I1[inc_bal_blocks_total IntCounter]
    I --> I2[set_bal_size_bytes Gauge plus Histogram]
    I --> I3[set_bal_account_count IntGauge]
    I --> I4[set_bal_slot_count IntGauge]
    G -- No --> J[Skip BAL metrics]
    I1 & I2 & I3 & I4 --> K[gather_metrics Prometheus scrape]
    K --> L[Grafana BAL EIP-7928 row]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/blockchain/metrics/blocks.rs:186-192
The `0.0` bucket prepended to the histogram will never be populated: `set_bal_size_bytes` is only called inside the `if let Some(bal) = …` branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The `le=0` bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.

```suggestion
                .buckets(
                    prometheus::exponential_buckets(1024.0, 2.0, 16)
                        .expect("Invalid bucket params"),
                ),
```

### Issue 2 of 2
crates/blockchain/blockchain.rs:1967-1975
**Variable shadowing of `bal` parameter**

The `if let Some(bal) = produced_bal.as_ref().or(bal)` pattern silently shadows the outer `bal: Option<&BlockAccessList>` parameter with a `&BlockAccessList` binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as `bal_ref` for the pattern binding would make the intent clearer at no cost.

Reviews (1): Last reviewed commit: "fix(l1): rework BAL dashboard panel layo..." | Re-trigger Greptile

Comment on lines +186 to +192
.buckets({
let mut buckets =
prometheus::exponential_buckets(1024.0, 2.0, 16)
.expect("Invalid bucket params");
buckets.insert(0, 0.0);
buckets
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The 0.0 bucket prepended to the histogram will never be populated: set_bal_size_bytes is only called inside the if let Some(bal) = … branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The le=0 bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.

Suggested change
.buckets({
let mut buckets =
prometheus::exponential_buckets(1024.0, 2.0, 16)
.expect("Invalid bucket params");
buckets.insert(0, 0.0);
buckets
}),
.buckets(
prometheus::exponential_buckets(1024.0, 2.0, 16)
.expect("Invalid bucket params"),
),
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/metrics/blocks.rs
Line: 186-192

Comment:
The `0.0` bucket prepended to the histogram will never be populated: `set_bal_size_bytes` is only called inside the `if let Some(bal) = …` branch, so the observed value is always the RLP-encoded length of a non-empty BAL (strictly > 0). The `le=0` bucket will always be 0, adding dead weight to every Prometheus scrape and making the heatmap bucket range slightly misleading. Consider starting the range at 1 byte instead, or simply dropping the prepend.

```suggestion
                .buckets(
                    prometheus::exponential_buckets(1024.0, 2.0, 16)
                        .expect("Invalid bucket params"),
                ),
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1967 to +1975
metrics!(if let Some(bal) = produced_bal.as_ref().or(bal) {
let account_count = bal.accounts().len() as u64;
let slot_count = bal.item_count().saturating_sub(account_count);
let size_bytes = bal.length() as f64;
METRICS_BLOCKS.inc_bal_blocks_total();
METRICS_BLOCKS.set_bal_size_bytes(size_bytes);
METRICS_BLOCKS.set_bal_account_count(account_count as i64);
METRICS_BLOCKS.set_bal_slot_count(slot_count as i64);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Variable shadowing of bal parameter

The if let Some(bal) = produced_bal.as_ref().or(bal) pattern silently shadows the outer bal: Option<&BlockAccessList> parameter with a &BlockAccessList binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as bal_ref for the pattern binding would make the intent clearer at no cost.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 1967-1975

Comment:
**Variable shadowing of `bal` parameter**

The `if let Some(bal) = produced_bal.as_ref().or(bal)` pattern silently shadows the outer `bal: Option<&BlockAccessList>` parameter with a `&BlockAccessList` binding of the same name. While the compiler accepts this, a reader scanning the block may confuse the two. A distinct name such as `bal_ref` for the pattern binding would make the intent clearer at no cost.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE 1: this PR is adding to the divergent side of an existing split in how Prometheus is wired — same shape as the concern I flagged in #6466 (review body).

There are two patterns in the codebase today:

  • register_* macro / default-registry pattern (2 modules): rpc.rs (introduced in #5335, 2025-11-24) and sync.rs (#6470, 2026-04-20). Each metric is created with register_int_gauge! / register_counter_vec! / etc., which registers it into Prometheus's default registry at construction time. api.rs surfaces them automatically via gather_default_metrics().

  • Manual Registry::new() pattern (6 modules): blocks.rs (this one), node.rs, p2p.rs, process.rs, transactions.rs, l2/metrics.rs. Each module owns a singleton struct of bare collectors, exposes a gather_metrics() method that builds a fresh Registry::new() per call, and api.rs:40 has to know about each module and concatenate their output.

We should be moving towards the first pattern; that's tech debt to be addressed in some future effort. But this PR adds 5 new instruments from scratch — a free moment to land them in the better shape rather than grow blocks.rs's divergent surface by 5 entries. Concrete deltas vs the current diff:

  1. Move BAL metrics to a sibling module crates/blockchain/metrics/bal.rs (mirroring sync.rs). Use register_int_counter!("ethrex_bal_blocks_total", ...), register_gauge!("ethrex_bal_size_bytes", ...), register_histogram_with_buckets!("ethrex_bal_size_bytes", ...), register_int_gauge!("ethrex_bal_account_count", ...), register_int_gauge!("ethrex_bal_slot_count", ...). Each macro registers into the default Prometheus registry at construction time. New metrics auto-surface via gather_default_metrics() with zero api.rs change.

  2. Naming becomes ethrex_bal_* (with the ethrex_ prefix), matching the ethrex_sync_* / ethrex_rpc_* convention used in the macro-pattern modules. The bare bal_* names the current diff uses match the cross-client OTel BAL spec literally, but most cross-client dashboards query with regex (.*bal_.*) — confirming this against the spec text before merge would lock in the right name. If the spec mandates literal bal_* with no client-prefix, the macro pattern doesn't fit and the current shape stays; if it allows a <client>_ prefix, the macro pattern is the better landing spot.

  3. Drop the set_bal_* / inc_bal_blocks_total methods on MetricsBlocks. With the new module's instruments being pub static LazyLock<...>, the call sites in blockchain.rs become METRICS_BAL.size_bytes.set(size) / METRICS_BAL.blocks_total.inc() — direct collector access, no wrapper methods.

  4. Drop api.rs's explicit METRICS_BLOCKS.gather_metrics() call for the BAL slice. Once the metrics live in the default registry via macros, api.rs doesn't need to know they exist. Mirrors the comment at api.rs:57-58 from #6470: "METRICS_SYNC registers into the default Prometheus registry at init, so its metrics are already included in gather_default_metrics() above."

The 5 instruments map 1:1 to the current diff — same types, same descriptions, same produced_bal.as_ref().or(bal) call site in blockchain.rs. Mechanical refactor; the only behavioral change is the metric names gaining the ethrex_ prefix (point 2 above).


Note 2: test plan + CHANGELOG gaps. make localnet against an Amsterdam-active config, confirm the BAL row populates post-fork is unchecked — that's the validation that proves the dashboard actually shows data. And no CHANGELOG entry for an observability addition; a one-liner under "Observability" would help operators find it later.

The two inline observations below stand independently of the manual-vs-macro choice above — they apply either way.

"Distribution of RLP-encoded Block Access List sizes in bytes",
)
.buckets({
let mut buckets =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram bucket choice. exponential_buckets(1024, 2.0, 16) produces buckets at 1 KiB, 2 KiB, 4 KiB, …, 32 MiB (16 buckets), then buckets.insert(0, 0.0) prepends a zero floor. The implementation is fine and the comment cites the right range, but the literal 1024.0, 2.0, 16 is easy to misread; future-you wondering "is the top bucket 16 MiB or 32 MiB?" has to do the arithmetic.

Cheap improvement: a named local or constants:

const BAL_HISTOGRAM_BASE: f64 = 1024.0;  // 1 KiB
const BAL_HISTOGRAM_BUCKETS: usize = 16;  // up to ~32 MiB at base 2

or inline-comment the buckets list:

.buckets({
    // 0, 1 KiB, 2 KiB, 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB, 128 KiB,
    // 256 KiB, 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, 16 MiB, 32 MiB.
    let mut buckets = prometheus::exponential_buckets(1024.0, 2.0, 16)

Low priority; just readability.

}

metrics!(if let Some(bal) = produced_bal.as_ref().or(bal) {
let account_count = bal.accounts().len() as u64;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bal_size_bytes / bal_account_count / bal_slot_count are per-block-snapshot Gauges (replaced on every new BAL block). The existing convention at lines ~1997-2000 in the same file uses latest_* for the same shape (set_latest_gas_used, set_latest_block_gas_limit, set_latest_gigagas).

If you stick with the manual-Registry pattern in blocks.rs (i.e. don't take the body's Body Finding 1 suggestion), worth at least a one-line doc-comment on each field declaration noting it's a snapshot Gauge (not an instantaneous value) so future readers don't expect monotonic semantics:

/// RLP-encoded size of the most recent BAL, in bytes (per-block snapshot).
bal_size_bytes: Gauge,

If you take the body's suggestion and move to a new bal.rs module using the macro pattern, the naming question shifts to whether the cross-client OTel BAL spec requires literal bal_* or allows an <client>_bal_* prefix — see the body. Non-blocking either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants