Skip to content

Optimization/prewarmer per sender#10335

Merged
kamilchodola merged 21 commits intoperformancefrom
kch/optimize_warmup
Jan 27, 2026
Merged

Optimization/prewarmer per sender#10335
kamilchodola merged 21 commits intoperformancefrom
kch/optimize_warmup

Conversation

@kamilchodola
Copy link
Contributor

Fixes Closes Resolves #

Please choose one of the keywords above to refer to the issue this PR solves followed by the issue number (e.g. Fixes #000). If no issue number, remove the line. Also, remove everything marked optional that is not applicable. Remove this note after reading.

Changes

  • List the changes

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

  • Yes
  • No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

  • Yes
  • No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

asdacap and others added 21 commits January 21, 2026 09:02
…10273)

* docs: add implementation plan for ProgressLogger trie visitor integration

Addresses #8504 - More use of ProgressLogger

Detailed step-by-step plan with:
- VisitorProgressTracker class implementation
- Unit tests for thread-safety and accuracy
- Integration into CopyTreeVisitor and TrieStatsCollector

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(trie): add VisitorProgressTracker for path-based progress estimation

Addresses #8504 - More use of ProgressLogger

- Tracks visited path prefixes at 4 levels (16 to 65536 granularity)
- Thread-safe for concurrent traversal
- Estimates progress from keyspace position, not node count

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(trie): add unit tests for VisitorProgressTracker

Tests cover:
- Progress tracking at different levels
- Thread-safety with concurrent calls
- Monotonically increasing progress
- Edge cases (short paths, empty path)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(pruning): integrate VisitorProgressTracker into CopyTreeVisitor

Replaces manual every-1M-nodes logging with path-based progress estimation.
Progress now shows actual percentage through the keyspace.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Improve TrieStatsCollector progress display

- Always enable progress tracking in TrieStatsCollector
- Add custom formatter to show node count instead of block speed
- Track max reported progress to prevent backwards jumps
- Display format: "Trie Verification  12.34% [...] nodes: 1.2M"

Fixes progress display issues where:
- Progress would jump backwards (12% → 5%) due to granularity switching
- Showed confusing "Blk/s" units for trie operations
- Displayed "11 / 100 (11.00%)" format that looked odd

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: remove implementation plan documents

Implementation is complete, no need for plan docs in the codebase.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Track both state and storage nodes in progress display

The node count now includes both state and storage nodes, providing
a more accurate representation of total work done. Progress estimation
still uses state trie paths only.

Changes:
- Add _totalWorkDone counter for display (state + storage nodes)
- Add isStorage parameter to OnNodeVisited()
- Always increment total work, only track state nodes for progress

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Optimize progress tracking with active level and startup delay

Improvements:
- Add 1 second startup delay before logging to prevent early high
  values from getting stuck in _maxReportedProgress
- Only track the deepest level with >5% coverage (active level)
- Stop incrementing counts for shallower levels once deeper level
  has significant coverage
- This ensures progress never shows less than 5% and provides
  more accurate granularity

Technical changes:
- Add _activeLevel field to track current deepest significant level
- Add _startTime field and skip logging for first second
- Only increment seen counts at active level or deeper
- Automatically promote to deeper level when >5% coverage reached

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Simplify progress tracking to only use level 3 with leaf estimation

Changed to a much simpler approach as requested:
- Only track progress at level 3 (4 nibbles = 65536 possible nodes)
- For nodes at depth 4: increment count by 1
- For LEAF nodes at shallower depths: estimate coverage
  - Depth 1: covers 16^3 = 4096 level-3 nodes
  - Depth 2: covers 16^2 = 256 level-3 nodes
  - Depth 3: covers 16^1 = 16 level-3 nodes
- Non-leaf nodes at shallow depths: don't count (will be covered by deeper nodes)
- Keep 1 second startup delay to prevent early high percentages

This assumes the top of the tree is dense and provides accurate
progress estimation based on actual trie structure.

Updated tests to mark nodes as leaves where appropriate.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Fix full pruning progress tracking

- Pass isStorage and isLeaf parameters in CopyTreeVisitor
- Storage nodes no longer contribute to state trie progress estimation
- Leaf nodes at shallow depths now correctly estimate coverage
- Increase startup delay to 5 seconds AND require at least 1% progress
- Prevents early high estimates from getting stuck in _maxReportedProgress

This fixes the issue where full pruning progress would immediately jump
to 100% and not show meaningful progress during the copy operation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Simplify VisitorProgressTracker to single-level tracking

Since we only track level 3 (4 nibbles), remove unnecessary array
structure:

- Replace int[][] _seen with int[] _seen (65536 entries)
- Replace int[] _seenCounts with int _seenCount
- Replace int[] MaxAtLevel with const int MaxNodes
- Rename MaxLevel to Level3Depth for clarity

This reduces memory allocation from 70,304 ints (16+256+4096+65536)
to just 65,536 ints, and makes the code clearer.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Remove unnecessary _seen array from VisitorProgressTracker

Since OnNodeVisited is only called once per path, we don't need to
track which prefixes we've seen. Just increment _seenCount directly.

This eliminates the 65536-int array, reducing memory from 262KB to
just a few counters.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Remove _maxReportedProgress and allow progress to reverse

- Remove _maxReportedProgress field and backwards-prevention logic
- Report actual progress value even if it goes backwards
- Fix path.Length check: only count nodes at exactly Level3Depth
- Ignore nodes at depth > Level3Depth for progress calculation
- Simplify comment about startup delay

Progress should reflect reality, not be artificially constrained.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Fix lint

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat: enable taiko client ci integration tests

* fix: gh action structure to run l2_nmc locally

* feat: add path for ci-taiko file

* Update GitHub Actions checkout reference surge-taiko-mono
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Warmup threads do not update tx.SpentGas
* Remove mark persisted

* Whitespace
* Test in chunks

* Test

* Sequential

* Test

* Simplify
Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>
Co-authored-by: rubo <rubo@users.noreply.github.com>
* fix(chainspec): add maxCodeSize to spaceneth for EIP-3860

* fix(chainspec): add explicit chainID to spaceneth

* fix(chainspec): add Prague system contracts to spaceneth genesis
* Initial plan

* Add warning when dirty prune cache is too low

When the dirty prune cache is too low, pruning cannot effectively reduce
the node cache, causing it to keep re-pruning with little progress.
This change adds a warning when the pruning cache size after pruning
is more than 80% of its size before pruning, suggesting to increase
the pruning cache limit.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Extract magic number to named constant

Extract the 0.8 threshold to PruningEfficiencyWarningThreshold constant
for better code readability and maintainability.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Mention --Pruning.DirtyCacheMb argument in warning message

Updated the warning message to include the specific command-line
argument (--Pruning.DirtyCacheMb) that users can use to increase
the pruning cache limit.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Include recommended cache size in warning message

Added calculation and display of recommended dirty cache size
(current size + 30%) in the warning message to provide users
with a concrete value to set.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Update warning threshold to 0.9 and use lowercase argument format

Changed PruningEfficiencyWarningThreshold from 0.8 to 0.9 (now warns
when retention ratio > 90% instead of > 80%). Updated argument format
in warning message from --Pruning.DirtyCacheMb to --pruning-dirtycachemb.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Optimize warning check and add guard for sparse networks

- Move _logger.IsWarn check to outer if statement to skip calculations
  when logging is disabled
- Add minimum threshold (256MB) to prevent false positives on sparse
  networks with many empty blocks
- Addresses code review feedback from @LukaszRozmej

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Update src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Remove 256MB threshold and add cspell ignore for CLI args

- Removed 256MB minimum threshold as function is only called when
  memory exceeds pruning threshold (per @asdacap feedback)
- Added cspell ignore rule for command-line arguments pattern
  (--something-something) to fix cspell warnings
- Addresses feedback from @LukaszRozmej, @asdacap, and @flcl42

Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>

* Simplify cspell regex to match exactly two-word CLI flags

Changed pattern from /--[a-z]+(-[a-z]+)*/gi to /--[a-z]+-[a-z]+/gi
since all CLI flags are always exactly two words (e.g., --pruning-dirtycachemb).
Addresses feedback from @flcl42.

Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Co-authored-by: Amirul Ashraf <asdacap@gmail.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>
…ntRange (#10298)

Update SnapProviderHelper.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
…oW sync (#10307)

* fix(sync): Handle OperationCanceledException as timeout in PowForwardHeaderProvider

## Problem

PoW chain sync (ETC, etc.) stops completely after a single header request
timeout when running in DEBUG mode. The sync stalls with "SyncDispatcher
has finished work" even though blocks remain to sync.

## Root Cause

Commit cc56a03 ("Reduce exceptions in ZeroProtocolHandlerBase") changed
timeout handling from throwing TimeoutException to calling TrySetCanceled():

```csharp
// Before: throw new TimeoutException(...);
// After:  request.CompletionSource.TrySetCanceled(cancellationToken);
```

This was a performance optimization to reduce exception overhead, but it
changed the contract: callers expecting TimeoutException now receive
OperationCanceledException (via TaskCanceledException).

PowForwardHeaderProvider only caught TimeoutException:

```csharp
catch (TimeoutException)
{
    syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader);
    return null;
}
```

The uncaught OperationCanceledException propagates to BlockDownloader which,
in DEBUG mode, re-throws it:

```csharp
#if DEBUG
    throw;      // DEBUG: propagates, kills sync
#else
    return null; // RELEASE: swallows error, sync continues
#endif
```

SyncDispatcher interprets OperationCanceledException as "sync was cancelled"
and calls Feed.Finish(), stopping sync permanently.

## The Fix

Add a catch for OperationCanceledException with a guard clause:

```csharp
catch (OperationCanceledException) when (!cancellation.IsCancellationRequested)
{
    syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader);
    return null;
}
```

The condition `when (!cancellation.IsCancellationRequested)` distinguishes:
- Protocol timeout: original token NOT cancelled → handle as weak peer
- Real sync cancellation: original token IS cancelled → propagate exception

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(trie): Mark BlockCommitSet as sealed even when root is null

BlockCommitSet.IsSealed returned `Root is not null`, which was false for
empty state tries where root is null. This caused a Debug.Assert failure
in TrieStore.VerifyNewCommitSet when running in Debug mode, as the
assertion checked that the previous BlockCommitSet was sealed before
starting a new block commit.

An empty state trie with Keccak.EmptyTreeHash is valid (e.g., genesis
blocks with no allocations). Changed IsSealed to use a separate _isSealed
flag that is set when Seal() is called, regardless of whether the root
is null.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Apply suggestions from code review

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Update TrieStore.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
* fix: correct off-by-one in ArrayPoolListCore.RemoveAt

* add test

* Update src/Nethermind/Nethermind.Core/Collections/ArrayListCore.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
@kamilchodola kamilchodola requested review from a team and rubo as code owners January 27, 2026 00:02
@kamilchodola kamilchodola merged commit 99fef4b into performance Jan 27, 2026
65 of 66 checks passed
@kamilchodola kamilchodola deleted the kch/optimize_warmup branch January 27, 2026 00:02
benaadams added a commit that referenced this pull request Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.