Skip to content

perf(provider): make static-file changeset counts O(files)#23296

Closed
decofe wants to merge 6 commits into
mainfrom
yk/fix-changeset-count-perf
Closed

perf(provider): make static-file changeset counts O(files)#23296
decofe wants to merge 6 commits into
mainfrom
yk/fix-changeset-count-perf

Conversation

@decofe
Copy link
Copy Markdown
Member

@decofe decofe commented Mar 31, 2026

Closes RETH-694

changeset_count() scans every .csoff record to compute a progress-bar denominator for the history indexing stages.

  • At block 10.7M, that's ~10.7M 16-byte reads per call, taking ~9s
  • Both account and storage counts run every pipeline cycle → ~20s total
  • This is longer than the batch being processed, so after a restart the pipeline can never catch up

Since .csoff records are cumulative [offset, num_changes], the total is just last.offset + last.num_changes.

  • Reads one record per file instead of all records
  • O(files) instead of O(blocks) → 22 reads instead of 10.7M

Before: 502ms (2.2M blocks), ~9s (10.7M blocks)
After: 0.2ms (2.2M blocks), ~0.3ms (10.7M blocks)

Repro and timings: https://gist.github.com/yongkangc/06d1da54a32b4a1818c9432483fef7a6

Co-Authored-By: YK <46377366+yongkangc@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

⚠️ Changelog not found.

A changelog entry is required before merging. We've generated a suggested changelog based on your changes:

Preview
---
reth-static-file-types: minor
reth-provider: patch
---

Added `total_changes()` method to `ChangesetOffsetReader` that efficiently reads only the last sidecar record instead of iterating all records. Refactored `account_changeset_count` and `storage_changeset_count` in `StaticFileProvider` to share a common `changeset_count_for_segment` helper using the new method, and added integration tests for both.

Add changelog to commit this to your branch.

Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d423c-8436-76a8-8b92-e700bd180b8a
@yongkangc yongkangc added C-bug An unexpected or incorrect behavior A-db Related to the database A-static-files Related to static files labels Mar 31, 2026
@yongkangc yongkangc requested a review from Rjected March 31, 2026 04:57
@yongkangc yongkangc marked this pull request as draft March 31, 2026 04:58
Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d423c-8436-76a8-8b92-e700bd180b8a
@yongkangc yongkangc changed the title perf: make changeset_count() O(files) instead of O(blocks) perf(provider): make static-file changeset counts O(files) Mar 31, 2026
Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d423c-8436-76a8-8b92-e700bd180b8a
Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d423c-8436-76a8-8b92-e700bd180b8a
@yongkangc yongkangc marked this pull request as ready for review March 31, 2026 05:29
Copilot AI review requested due to automatic review settings March 31, 2026 05:29
Co-Authored-By: YK <46377366+yongkangc@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes static-file changeset progress denominator computations by avoiding per-record scans of .csoff sidecar files. Instead, it derives the total count per static file from the last committed changeset-offset record, reducing *_changeset_count() from O(blocks) to O(files) while preserving existing stage semantics.

Changes:

  • Replace account_changeset_count() / storage_changeset_count() implementations to sum per-file totals computed from the last committed .csoff record.
  • Add ChangesetOffsetReader::total_changes() to compute total changes via a single last-record read.
  • Add regression tests covering changeset counts across multiple static files for both account and storage changesets.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
crates/storage/provider/src/providers/static_file/manager.rs Switch changeset count computation to an O(files) approach using a shared helper that reads only the last committed sidecar record per file.
crates/static-file/types/src/changeset_offsets.rs Add total_changes() API on the sidecar reader plus unit tests validating behavior and committed-length handling.
crates/storage/provider/src/providers/static_file/mod.rs Add integration-style tests ensuring account/storage changeset counts are correct across multiple static files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@emmajam emmajam closed this Apr 1, 2026
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Reth Tracker Apr 1, 2026
@emmajam emmajam deleted the yk/fix-changeset-count-perf branch May 1, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-db Related to the database A-static-files Related to static files C-bug An unexpected or incorrect behavior

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants