Skip to content

feat(static-file): incremental changeset offset storage#21593

Closed
gakonst wants to merge 1 commit into
mainfrom
joshie/incremental-changeset-offsets
Closed

feat(static-file): incremental changeset offset storage#21593
gakonst wants to merge 1 commit into
mainfrom
joshie/incremental-changeset-offsets

Conversation

@gakonst
Copy link
Copy Markdown
Member

@gakonst gakonst commented Jan 29, 2026

Summary

Replace inline Vec<ChangesetOffset> in SegmentHeader with a separate .csoff sidecar file for incremental append/prune operations.

Problem

Previously, changeset offsets were stored as Vec<ChangesetOffset> in SegmentHeader and fully rewritten on every commit. For segments with 500k+ blocks, this meant ~8MB rewritten per commit even when appending a single block.

Solution

  • Store offsets in a separate .csoff sidecar file (fixed 16-byte records)
  • SegmentHeader now stores only changeset_offsets_len: u64 (count)
  • Crash consistency: sidecar is synced before header commit

Performance Impact

Operation Before After
Append 1 block O(total_blocks) O(1) (16 bytes)
Commit overhead ~8MB (500k blocks) ~100 bytes
Prune O(remaining_blocks) O(1)
Lookup O(1) from memory O(1) from mmap/pread

Changes

reth-static-file-types

  • Add ChangesetOffsetsWriter/ChangesetOffsetsReader for sidecar I/O
  • Replace changeset_offsets: Vec with changeset_offsets_len: u64 in SegmentHeader
  • Update serialization to write u64 count instead of Vec

reth-nippy-jar

  • Add changeset_offsets_path() for .csoff file paths
  • Update delete() to clean up sidecar files

reth-provider

  • Add ChangesetOffsetsWriter to StaticFileProviderRW
  • Track current block offset during writes, write to sidecar on block completion
  • Sync sidecar before header commit (crash consistency)
  • Add read_changeset_offset()/read_changeset_offsets() to StaticFileJarProvider
  • Update manager.rs to read offsets from sidecar file
  • Update truncate_changesets to truncate sidecar file

Breaking Change

This changes the static file format for changeset segments. Existing changeset static files are not backwards compatible and must be regenerated.

Testing

cargo test -p reth-static-file-types
cargo check -p reth-provider

cc @joshieDo @mattsse

…cture

Currently, changeset offsets (Vec<ChangesetOffset>) are stored inline in
SegmentHeader and fully rewritten on every commit. For segments with 500k+
blocks, this means ~8MB rewritten per commit even when appending a single
block.

This PR adds the infrastructure for incremental changeset offset storage:

1. ChangesetOffsetsMeta: Lightweight metadata struct (len + version) that
   will replace the inline Vec in SegmentHeader
2. ChangesetOffsetWriter: Append-only writer for the new .csoff sidecar file
3. ChangesetOffsetReader: O(1) random-access reader using fixed 16-byte records
4. CHANGESET_OFFSETS_FILE_EXTENSION constant for the new file type
5. Design doc explaining the migration strategy and crash consistency

Performance impact:
- Append: O(total_blocks) -> O(1) (16 bytes per block)
- Commit overhead: ~8MB for 500k blocks -> ~100 bytes (header only)
- Prune: O(remaining_blocks) -> O(1) (len update only)

Follow-up PRs will:
- Integrate the sidecar into SegmentHeader (replace Vec with Meta)
- Update StaticFileProviderRW commit/prune paths
- Add migration logic for existing segments
@gakonst
Copy link
Copy Markdown
Member Author

gakonst commented Jan 29, 2026

CI Fix Blocked

The base commit (feat: sparse trie as cache #21583) introduced inconsistencies in SegmentHeader:

  1. The struct uses changeset_offsets: Option<Vec<ChangesetOffset>>
  2. But methods like changeset_offsets_len() try to access self.changeset_offsets_len (a non-existent field)
  3. Tests use changeset_offsets_len: 100 but struct expects changeset_offsets: Some(vec![...])

Options:

  1. Fix the base commit first (separate PR to fix feat: sparse trie as cache #21583's breakage)
  2. This PR should include fixes for the base commit inconsistencies

Closing this draft PR until the base is stable. Will re-open with a clean implementation.

@gakonst
Copy link
Copy Markdown
Member Author

gakonst commented Jan 29, 2026

Closing due to inconsistencies in base commit. Will re-open after fixing upstream issues.

@gakonst gakonst closed this Jan 29, 2026
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Reth Tracker Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant