Optimization/prewarmer per sender by kamilchodola · Pull Request #10335 · NethermindEth/nethermind

kamilchodola · 2026-01-26T23:59:44Z

Fixes Closes Resolves #

Please choose one of the keywords above to refer to the issue this PR solves followed by the issue number (e.g. Fixes #000). If no issue number, remove the line. Also, remove everything marked optional that is not applicable. Remove this note after reading.

Changes

List the changes

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

Yes
No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

Yes
No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

…10273) * docs: add implementation plan for ProgressLogger trie visitor integration Addresses #8504 - More use of ProgressLogger Detailed step-by-step plan with: - VisitorProgressTracker class implementation - Unit tests for thread-safety and accuracy - Integration into CopyTreeVisitor and TrieStatsCollector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(trie): add VisitorProgressTracker for path-based progress estimation Addresses #8504 - More use of ProgressLogger - Tracks visited path prefixes at 4 levels (16 to 65536 granularity) - Thread-safe for concurrent traversal - Estimates progress from keyspace position, not node count Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(trie): add unit tests for VisitorProgressTracker Tests cover: - Progress tracking at different levels - Thread-safety with concurrent calls - Monotonically increasing progress - Edge cases (short paths, empty path) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(pruning): integrate VisitorProgressTracker into CopyTreeVisitor Replaces manual every-1M-nodes logging with path-based progress estimation. Progress now shows actual percentage through the keyspace. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Improve TrieStatsCollector progress display - Always enable progress tracking in TrieStatsCollector - Add custom formatter to show node count instead of block speed - Track max reported progress to prevent backwards jumps - Display format: "Trie Verification 12.34% [...] nodes: 1.2M" Fixes progress display issues where: - Progress would jump backwards (12% → 5%) due to granularity switching - Showed confusing "Blk/s" units for trie operations - Displayed "11 / 100 (11.00%)" format that looked odd Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: remove implementation plan documents Implementation is complete, no need for plan docs in the codebase. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Track both state and storage nodes in progress display The node count now includes both state and storage nodes, providing a more accurate representation of total work done. Progress estimation still uses state trie paths only. Changes: - Add _totalWorkDone counter for display (state + storage nodes) - Add isStorage parameter to OnNodeVisited() - Always increment total work, only track state nodes for progress Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Optimize progress tracking with active level and startup delay Improvements: - Add 1 second startup delay before logging to prevent early high values from getting stuck in _maxReportedProgress - Only track the deepest level with >5% coverage (active level) - Stop incrementing counts for shallower levels once deeper level has significant coverage - This ensures progress never shows less than 5% and provides more accurate granularity Technical changes: - Add _activeLevel field to track current deepest significant level - Add _startTime field and skip logging for first second - Only increment seen counts at active level or deeper - Automatically promote to deeper level when >5% coverage reached Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Simplify progress tracking to only use level 3 with leaf estimation Changed to a much simpler approach as requested: - Only track progress at level 3 (4 nibbles = 65536 possible nodes) - For nodes at depth 4: increment count by 1 - For LEAF nodes at shallower depths: estimate coverage - Depth 1: covers 16^3 = 4096 level-3 nodes - Depth 2: covers 16^2 = 256 level-3 nodes - Depth 3: covers 16^1 = 16 level-3 nodes - Non-leaf nodes at shallow depths: don't count (will be covered by deeper nodes) - Keep 1 second startup delay to prevent early high percentages This assumes the top of the tree is dense and provides accurate progress estimation based on actual trie structure. Updated tests to mark nodes as leaves where appropriate. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix full pruning progress tracking - Pass isStorage and isLeaf parameters in CopyTreeVisitor - Storage nodes no longer contribute to state trie progress estimation - Leaf nodes at shallow depths now correctly estimate coverage - Increase startup delay to 5 seconds AND require at least 1% progress - Prevents early high estimates from getting stuck in _maxReportedProgress This fixes the issue where full pruning progress would immediately jump to 100% and not show meaningful progress during the copy operation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Simplify VisitorProgressTracker to single-level tracking Since we only track level 3 (4 nibbles), remove unnecessary array structure: - Replace int[][] _seen with int[] _seen (65536 entries) - Replace int[] _seenCounts with int _seenCount - Replace int[] MaxAtLevel with const int MaxNodes - Rename MaxLevel to Level3Depth for clarity This reduces memory allocation from 70,304 ints (16+256+4096+65536) to just 65,536 ints, and makes the code clearer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unnecessary _seen array from VisitorProgressTracker Since OnNodeVisited is only called once per path, we don't need to track which prefixes we've seen. Just increment _seenCount directly. This eliminates the 65536-int array, reducing memory from 262KB to just a few counters. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove _maxReportedProgress and allow progress to reverse - Remove _maxReportedProgress field and backwards-prevention logic - Report actual progress value even if it goes backwards - Fix path.Length check: only count nodes at exactly Level3Depth - Ignore nodes at depth > Level3Depth for progress calculation - Simplify comment about startup delay Progress should reflect reality, not be artificially constrained. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix lint --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: enable taiko client ci integration tests * fix: gh action structure to run l2_nmc locally * feat: add path for ci-taiko file * Update GitHub Actions checkout reference surge-taiko-mono

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

…balance updates (#10268) Update StateProvider.cs

…hods (#10209) Update DbOnTheRocks.cs

Warmup threads do not update tx.SpentGas

* Remove mark persisted * Whitespace

* Test in chunks * Test * Sequential * Test * Simplify

Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>

Co-authored-by: rubo <rubo@users.noreply.github.com>

* fix(chainspec): add maxCodeSize to spaceneth for EIP-3860 * fix(chainspec): add explicit chainID to spaceneth * fix(chainspec): add Prague system contracts to spaceneth genesis

@LukaszRozmej

* Initial plan * Add warning when dirty prune cache is too low When the dirty prune cache is too low, pruning cannot effectively reduce the node cache, causing it to keep re-pruning with little progress. This change adds a warning when the pruning cache size after pruning is more than 80% of its size before pruning, suggesting to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Extract magic number to named constant Extract the 0.8 threshold to PruningEfficiencyWarningThreshold constant for better code readability and maintainability. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Mention --Pruning.DirtyCacheMb argument in warning message Updated the warning message to include the specific command-line argument (--Pruning.DirtyCacheMb) that users can use to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Include recommended cache size in warning message Added calculation and display of recommended dirty cache size (current size + 30%) in the warning message to provide users with a concrete value to set. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Update warning threshold to 0.9 and use lowercase argument format Changed PruningEfficiencyWarningThreshold from 0.8 to 0.9 (now warns when retention ratio > 90% instead of > 80%). Updated argument format in warning message from --Pruning.DirtyCacheMb to --pruning-dirtycachemb. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Optimize warning check and add guard for sparse networks - Move _logger.IsWarn check to outer if statement to skip calculations when logging is disabled - Add minimum threshold (256MB) to prevent false positives on sparse networks with many empty blocks - Addresses code review feedback from @LukaszRozmej Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Update src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * Remove 256MB threshold and add cspell ignore for CLI args - Removed 256MB minimum threshold as function is only called when memory exceeds pruning threshold (per @asdacap feedback) - Added cspell ignore rule for command-line arguments pattern (--something-something) to fix cspell warnings - Addresses feedback from @LukaszRozmej, @asdacap, and @flcl42 Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> * Simplify cspell regex to match exactly two-word CLI flags Changed pattern from /--[a-z]+(-[a-z]+)*/gi to /--[a-z]+-[a-z]+/gi since all CLI flags are always exactly two words (e.g., --pruning-dirtycachemb). Addresses feedback from @flcl42. Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> Co-authored-by: Amirul Ashraf <asdacap@gmail.com> Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>

…ntRange (#10298) Update SnapProviderHelper.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

…on (#10319)

…oW sync (#10307) * fix(sync): Handle OperationCanceledException as timeout in PowForwardHeaderProvider ## Problem PoW chain sync (ETC, etc.) stops completely after a single header request timeout when running in DEBUG mode. The sync stalls with "SyncDispatcher has finished work" even though blocks remain to sync. ## Root Cause Commit cc56a03 ("Reduce exceptions in ZeroProtocolHandlerBase") changed timeout handling from throwing TimeoutException to calling TrySetCanceled(): ```csharp // Before: throw new TimeoutException(...); // After: request.CompletionSource.TrySetCanceled(cancellationToken); ``` This was a performance optimization to reduce exception overhead, but it changed the contract: callers expecting TimeoutException now receive OperationCanceledException (via TaskCanceledException). PowForwardHeaderProvider only caught TimeoutException: ```csharp catch (TimeoutException) { syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader); return null; } ``` The uncaught OperationCanceledException propagates to BlockDownloader which, in DEBUG mode, re-throws it: ```csharp #if DEBUG throw; // DEBUG: propagates, kills sync #else return null; // RELEASE: swallows error, sync continues #endif ``` SyncDispatcher interprets OperationCanceledException as "sync was cancelled" and calls Feed.Finish(), stopping sync permanently. ## The Fix Add a catch for OperationCanceledException with a guard clause: ```csharp catch (OperationCanceledException) when (!cancellation.IsCancellationRequested) { syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader); return null; } ``` The condition `when (!cancellation.IsCancellationRequested)` distinguishes: - Protocol timeout: original token NOT cancelled → handle as weak peer - Real sync cancellation: original token IS cancelled → propagate exception 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(trie): Mark BlockCommitSet as sealed even when root is null BlockCommitSet.IsSealed returned `Root is not null`, which was false for empty state tries where root is null. This caused a Debug.Assert failure in TrieStore.VerifyNewCommitSet when running in Debug mode, as the assertion checked that the previous BlockCommitSet was sealed before starting a new block commit. An empty state trie with Keccak.EmptyTreeHash is valid (e.g., genesis blocks with no allocations). Changed IsSealed to use a separate _isSealed flag that is set when Seal() is called, regardless of whether the root is null. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Apply suggestions from code review --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

Update TrieStore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* fix: correct off-by-one in ArrayPoolListCore.RemoveAt * add test * Update src/Nethermind/Nethermind.Core/Collections/ArrayListCore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> --------- Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

…on previous state - higher success count

This reverts commit 99fef4b.

asdacap and others added 21 commits January 21, 2026 09:02

feat: enable taiko client CI integration tests (#10043)

726b99e

* feat: enable taiko client ci integration tests * fix: gh action structure to run l2_nmc locally * feat: add path for ci-taiko file * Update GitHub Actions checkout reference surge-taiko-mono

fix: remove unused IStateReader from SnapServer (#10282)

ff7206d

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

StateProvider: remove redundant state-root update flag assignment in …

406bd59

…balance updates (#10268) Update StateProvider.cs

refactor: eliminate delegate allocations in DbOnTheRocks iterator met…

156d30d

…hods (#10209) Update DbOnTheRocks.cs

Warmup threads should not update tx.SpentGas (#10267)

5b3ff6c

Warmup threads do not update tx.SpentGas

Remove mark persisted recursively (#10283)

0dab32d

* Remove mark persisted * Whitespace

Test project tests to be split in chunks and run in parallel (#10243)

f263398

* Test in chunks * Test * Sequential * Test * Simplify

Update OP Superchain chains (#10315)

7e2e7db

Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>

Auto-update fast sync settings (#10314)

c2819a1

Co-authored-by: rubo <rubo@users.noreply.github.com>

fix(chainspec): add Prague support to spaceneth dev chain (#10316)

e8041f7

* fix(chainspec): add maxCodeSize to spaceneth for EIP-3860 * fix(chainspec): add explicit chainID to spaceneth * fix(chainspec): add Prague system contracts to spaceneth genesis

refactor: remove redundant null checks in SnapProviderHelper.AddAccou…

82e128c

…ntRange (#10298) Update SnapProviderHelper.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

fix(txpool): remove redundant hasBeenRemoved check in RemoveTransacti…

a9b5656

…on (#10319)

fix(Trie): Correct log level check in PrunePersistedNodes (#10310)

0a4da44

Update TrieStore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

Prewarm in groups per sender - simplifies nonce management, can rely …

d90f147

…on previous state - higher success count

Simplification and cancellation

da21934

refactor

5001eba

Merge branch 'performance' into kch/optimize_warmup

338e4ef

kamilchodola marked this pull request as ready for review January 27, 2026 00:02

kamilchodola requested review from Demuirgos, LukaszRozmej, MarekM25, asdacap, benaadams, damian-orzechowski, flcl42 and marcindsobczak as code owners January 27, 2026 00:02

kamilchodola requested review from a team and rubo as code owners January 27, 2026 00:02

kamilchodola merged commit 99fef4b into performance Jan 27, 2026
65 of 66 checks passed

kamilchodola deleted the kch/optimize_warmup branch January 27, 2026 00:02

benaadams added a commit that referenced this pull request Jan 28, 2026

Revert "Optimization/prewarmer per sender (#10335)"

11d99e8

This reverts commit 99fef4b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization/prewarmer per sender#10335

Optimization/prewarmer per sender#10335
kamilchodola merged 21 commits intoperformancefrom
kch/optimize_warmup

kamilchodola commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Conversation

kamilchodola commented Jan 26, 2026

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants