Skip to content

fix(l1): break out of prefix iterator on mismatch in get_transaction_location#6702

Merged
ElFantasma merged 1 commit into
mainfrom
fix/6688-tx-location-prefix-scan
May 21, 2026
Merged

fix(l1): break out of prefix iterator on mismatch in get_transaction_location#6702
ElFantasma merged 1 commit into
mainfrom
fix/6688-tx-location-prefix-scan

Conversation

@ElFantasma
Copy link
Copy Markdown
Contributor

Motivation

Store::get_transaction_location was performing an unbounded RocksDB scan on every eth_getTransactionByHash call. On mainnet, this meant each call scanned hundreds of GB of compressed SSTs taking seconds per request, and under modest concurrent load (a few hundred persistent connections), it cascaded into multi-hour latencies and full saturation of the Tokio blocking pool — see #6688 for the full incident write-up.

Description

rust-rocksdb's prefix_iterator_cf seeks to the prefix but iterates forward indefinitely; the caller must break manually when the key no longer matches. The previous loop in crates/storage/store.rs:596-602 used an if filter that correctly ignored non-matching keys but didn't stop iteration, so every call scanned from the seek point to the end of the TRANSACTION_LOCATIONS column family.

Fix is one line: else { break; }. RocksDB stores keys in lexicographic order and all entries sharing a tx_hash prefix are contiguous; once we see a non-matching key, no further matches can appear ahead. Iteration is now bounded to (matches + 1) reads — typically 1-2 entries.

Verification on mainnet-10 (fresh sync, hot OS cache)

Single-call latency, identical reproducer to the issue body:

Position Hash Before After Speedup
Low (0x00…) 0x0000…001 39 ms 0.37 ms 105×
Mid 0x8000…000 15 ms 0.40 ms 37×
Real 0x454a…b611 21 ms 0.55 ms 38×
High (0xff…) 0xfff…ffe 0.20 ms 0.18 ms

The linear-with-position pattern disappears entirely — all calls flat at sub-millisecond.

Persistent-connection storm (oha -z 3m -c 200 --no-tui against worst-case low hash, same as the reproduction in issue #6688):

Metric Before After Improvement
Total requests in 3 min 39,368 47,611,407 1,210×
Throughput 218 req/s 264,504 req/s 1,213×
Mean latency ~900 ms 0.75 ms 1,200×
p50 877 ms 0.70 ms 1,253×
p95 1,517 ms 1.37 ms 1,107×
p99 1,843 ms 1.82 ms 1,012×
Slowest 2,870 ms 59.7 ms 48×
Errors 0 188 of 47.6M (deadline race at storm end) ≈0%

99.98% of requests now complete in <6 ms. The 22× bandwidth-contention amplifier observed under storm conditions is gone — the node finds a stable operating point at 264K req/s with no thread-pool saturation. The mathematical model from the issue (Little's Law-based queue growth) cleanly explains why eliminating the per-call O(table) cost defuses the entire cascade.

Follow-up improvements (separate PRs)

These are referenced in #6688 but worth restating:

  • Iterator upper bound at API level. Extend the prefix_iterator storage trait method to construct iterators with set_iterate_upper_bound(prefix + 1) so RocksDB itself stops at the bound. Removes the dependence on caller discipline. The same-PR fix here is sufficient for correctness, but a typed bounded-iterator API would prevent recurrence in any future call site.
  • Bloom filter on TRANSACTION_LOCATIONS for negative-lookup speedup. Only helps after compaction rewrites existing SSTs.
  • Schema redesign. Key by tx_hash only with the value being a Vec<(BlockNumber, BlockHash, Index)>. Reduces this from a prefix scan to a single get_cf. Best long-term option.
  • Cancellation propagation for spawn_blocking. Independent of this bug: orphaned blocking tasks from cancelled futures kept the pool saturated even after the rogue IP was blocked on mainnet-1.

Checklist

Closes #6688

@ElFantasma ElFantasma requested a review from a team as a code owner May 21, 2026 12:10
@github-actions github-actions Bot added the L1 Ethereum client label May 21, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

This is a correct and necessary fix for a critical performance issue. The change properly addresses the behavior of RocksDB's prefix iterator, which doesn't automatically terminate when keys no longer match the prefix.

Correctness & Logic

  • Line 610-612 (new): The break statement is correctly placed. Since RocksDB stores keys in lexicographic order, once a key no longer starts with the target tx_hash_bytes prefix, no subsequent keys can match. The logic holds because:
    1. Transaction hashes are fixed 32 bytes, so any valid key must start with exactly those bytes
    2. Keys are sorted, so all entries for a given tx_hash are contiguous
    3. The else branch catches both non-matching prefixes AND keys with incorrect lengths (corrupted or different schema), both of which indicate we've passed the valid range

Safety

Code Quality

Minor Observation (pre-existing, not changed in this PR)

  • Line 601: The pattern while let Some(Ok((key, value))) = iter.next() silently drops iterator errors (returning Some(Err(...)) exits the loop). Consider propagating iterator errors explicitly:
    while let Some(item) = iter.next() {
        let (key, value) = item?; // Propagate RocksDB errors
        // ...
    }

Conclusion: The fix is correct, safe, and addresses a severe performance regression. Ship it.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

PR Review: fix(l1,storage): break out of prefix iterator on mismatch in get_transaction_location

Overview

A one-line fix (else { break; }) that corrects an unbounded RocksDB prefix iterator scan in get_transaction_location. Without the break, every call would scan from the seek point to the end of the TRANSACTION_LOCATIONS column family — O(table-size) instead of O(matches + 1). The PR is minimal, well-documented, and the benchmark data is compelling.


Correctness

The fix is correct. prefix_iterator_cf seeks to the first key >= tx_hash_bytes and then iterates forward indefinitely. Since RocksDB uses bytewise lexicographic ordering by default, all 64-byte keys sharing the same first 32 bytes are contiguous. The first key that fails the match condition (key.len() != 64 or key[0..32] != tx_hash_bytes) signals that all subsequent keys also cannot match, making the break safe.

One subtle edge case worth noting for future readers: if a key of length < 64 that starts with tx_hash_bytes ever existed in this column family, it would appear before the valid 64-byte entries in lexicographic order (shorter prefix < longer extension), and the length check would fire, breaking prematurely. This is a non-issue given the fixed schema (tx_hash [32B] + block_hash [32B]), but it's worth keeping in mind if the schema ever evolves. If robustness is desired, the two conditions could be split:

// Break on prefix mismatch (ordering guarantee); skip-and-continue on malformed length
if key.len() < 32 || &key[0..32] != tx_hash_bytes {
    break;
} else if key.len() == 64 {
    transaction_locations.push(...);
}

This is a low-priority observation — the current form is fine given the stable schema.


Code Quality

  • Minimal change, maximum impact. One added line of logic, not a refactor.
  • Comment quality is high. The new block comment explains the non-obvious behaviour of prefix_iterator_cf, the ordering guarantee that makes the break safe, and links the upstream incident. This is exactly the kind of comment that earns its place (a hidden API pitfall that would not be visible from the code alone).
  • No new abstractions introduced — appropriate for a hotfix.

Performance

The benchmarks in the PR description are compelling and mechanistically explained:

  • Sub-millisecond single-call latency (previously seconds on low-hash lookups due to linear scan from start of column family).
  • ~1,210× throughput improvement under 200-connection storm.
  • The pathological amplification under concurrent load (full Tokio blocking-pool saturation → cascading queue growth) is cleanly explained via Little's Law.

Follow-up Items (noted in PR, tracked in issue)

The PR correctly defers these; they are not blockers for this fix:

  • Typed bounded-iterator API (set_iterate_upper_bound(prefix + 1)) — would make caller-discipline unnecessary and prevent recurrence at any future call site. This is the most important follow-up.
  • Bloom filter on TRANSACTION_LOCATIONS — only effective after compaction.
  • Schema redesign to key by tx_hash alone with a Vec<(BlockNumber, BlockHash, Index)> value — best long-term option; single point lookup instead of any scan.
  • spawn_blocking cancellation propagation — independent issue.

Summary

The fix is correct, minimal, and the added comment is genuinely useful (documenting a non-obvious RocksDB API contract). The benchmark data strongly validates the change. The only note is the theoretical edge case around the key.len() == 64 early-break, which is benign given the current schema but worth being aware of.

Verdict: approve. The follow-up items are well-scoped for separate PRs.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

No findings.

crates/storage/store.rs looks correct: stopping at the first non-matching key preserves correctness because RocksDB iterates in lexicographic key order, so all tx_hash || block_hash entries are contiguous, and the table writers only emit 64-byte keys in that format at crates/storage/store.rs and crates/storage/store.rs. This removes the accidental tail scan without affecting canonical-chain selection.

Residual risk is only test coverage: I did not find a regression test that exercises RocksDB get_transaction_location() with duplicate tx hashes across forks, so adding one would make this fix harder to regress. I did not run tests locally.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR fixes a critical performance regression in Store::get_transaction_location where rust-rocksdb's prefix_iterator_cf was being used without breaking out of the loop on a prefix mismatch, causing every eth_getTransactionByHash call to perform an unbounded forward scan through the entire TRANSACTION_LOCATIONS column family.

  • Adds else { break; } after the key-match check, exploiting RocksDB's lexicographic key ordering to guarantee all matching entries are contiguous — bounding the scan to (matches + 1) reads instead of O(table-size).
  • Replaces the existing if-only guard (which correctly filtered results but never halted iteration) with a proper early-exit, reducing per-call latency from tens of milliseconds to sub-millisecond on mainnet.

Confidence Score: 5/5

Safe to merge — the change is a single-clause addition that correctly bounds a previously unbounded iterator loop, supported by both the RocksDB ordering guarantee and extensive benchmarking data.

The fix is minimal, logically sound, and well-motivated. The break is correct because all TRANSACTION_LOCATIONS keys are constructed as fixed 64-byte composites (tx_hash || block_hash), so the contiguous-prefix ordering guarantee holds with no edge cases. Benchmarks confirm the expected behavior at scale.

No files require special attention; the single changed region in crates/storage/store.rs is straightforward.

Important Files Changed

Filename Overview
crates/storage/store.rs Adds else { break; } to stop the prefix iterator scan once a non-matching key is found, fixing an O(table-size) scan per eth_getTransactionByHash call and replacing it with an O(matches + 1) bounded scan.

Sequence Diagram

sequenceDiagram
    participant Caller as eth_getTransactionByHash
    participant SpawnBlocking as spawn_blocking task
    participant RocksDB as RocksDB (TRANSACTION_LOCATIONS CF)

    Caller->>SpawnBlocking: get_transaction_location(tx_hash)
    SpawnBlocking->>RocksDB: prefix_iterator(tx_hash_bytes)
    Note over RocksDB: Seeks to first key >= tx_hash_bytes

    loop Bounded scan (matches + 1 reads)
        RocksDB-->>SpawnBlocking: key, value
        alt "key.len()==64 && key[0..32]==tx_hash"
            SpawnBlocking->>SpawnBlocking: push to transaction_locations
        else key does not match (first mismatch)
            SpawnBlocking->>SpawnBlocking: break (NEW)
            Note over SpawnBlocking: No further matches possible due to lexicographic ordering
        end
    end

    SpawnBlocking->>SpawnBlocking: Filter by canonical chain
    SpawnBlocking-->>Caller: "Option<(BlockNumber, BlockHash, Index)>"
Loading

Reviews (1): Last reviewed commit: "fix(storage): break out of prefix iterat..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown

Lines of code report

Total lines added: 2
Total lines removed: 0
Total lines changed: 2

Detailed view
+--------------------------------+-------+------+
| File                           | Lines | Diff |
+--------------------------------+-------+------+
| ethrex/crates/storage/store.rs | 2684  | +2   |
+--------------------------------+-------+------+

…tion_location

`rust-rocksdb`'s prefix_iterator_cf seeks to the prefix but iterates forward
indefinitely — the caller must break manually when the key no longer matches.
The previous loop filtered non-matching keys via an `if` check but kept
iterating, silently scanning to the end of the TRANSACTION_LOCATIONS column
family on every call (hundreds of GB on mainnet, taking seconds to minutes
per call).

Add the missing `break`: once we see a non-matching key, all subsequent keys
also can't match (RocksDB key order is lexicographic and all entries with the
same tx_hash prefix are contiguous), so iteration can stop.

Closes #6688
@ElFantasma ElFantasma force-pushed the fix/6688-tx-location-prefix-scan branch from c21820e to 4f04508 Compare May 21, 2026 12:14
@ElFantasma ElFantasma changed the title fix(l1,storage): break out of prefix iterator on mismatch in get_transaction_location fix(l1): break out of prefix iterator on mismatch in get_transaction_location May 21, 2026
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 May 21, 2026
@ElFantasma ElFantasma added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 490d2c6 May 21, 2026
59 of 61 checks passed
@ElFantasma ElFantasma deleted the fix/6688-tx-location-prefix-scan branch May 21, 2026 15:19
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

perf(l1): eth_getTransactionByHash does unbounded RocksDB scan, makes node vulnerable to RPC retry storms

4 participants