Warn when dirty prune cache is too low#10143
Conversation
When the dirty prune cache is too low, pruning cannot effectively reduce the node cache, causing it to keep re-pruning with little progress. This change adds a warning when the pruning cache size after pruning is more than 80% of its size before pruning, suggesting to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Extract the 0.8 threshold to PruningEfficiencyWarningThreshold constant for better code readability and maintainability. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
|
@copilot mentioon the args |
Updated the warning message to include the specific command-line argument (--Pruning.DirtyCacheMb) that users can use to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
|
@copilot mention the reccommended value which is the current dirty cache size + 30%. |
Added calculation and display of recommended dirty cache size (current size + 30%) in the warning message to provide users with a concrete value to set. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Added recommended value calculation (current dirty cache + 30%) to the warning message (commit 9de000a). Example output: |
|
@copilot use |
Changed PruningEfficiencyWarningThreshold from 0.8 to 0.9 (now warns when retention ratio > 90% instead of > 80%). Updated argument format in warning message from --Pruning.DirtyCacheMb to --pruning-dirtycachemb. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
| if (_logger.IsInfo) _logger.Info($"Executed memory prune. Took {ms:0.##} ms. Dirty memory from {memoryUsedByDirtyCache / 1.MiB()}MB to {DirtyMemoryUsedByDirtyCache / 1.MiB()}MB"); | ||
|
|
||
| // Warn if pruning did not reduce the dirty cache significantly | ||
| if (memoryUsedByDirtyCache > 0) |
There was a problem hiding this comment.
I think this has a potential to unnecessary warn on sparse networks - networks where a lot of blocks can be empty. Would be good to have a guard for that, that it only fires where the original memoryUsedByDirtyCache was big enough.
There was a problem hiding this comment.
Added minimum threshold of 256MB in commit 629e60b to prevent false positives on sparse networks with many empty blocks.
There was a problem hiding this comment.
Shouldn't it be relative to current pruning cache size?
Let's assume size is default which is currently 1GB if I'm not mistaken. So what if the size was 256MB and nothing was pruned, that doesn't mean dirtycachemb should be increased.
There was a problem hiding this comment.
This whole function is only called when memory used exceed the threshold. So there should not be a problem with sparse network.
|
@copilot do it |
- Move _logger.IsWarn check to outer if statement to skip calculations when logging is disabled - Add minimum threshold (256MB) to prevent false positives on sparse networks with many empty blocks - Addresses code review feedback from @LukaszRozmej Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
|
@copilot add a fix in a pr to cspell issue related to |
- Removed 256MB minimum threshold as function is only called when memory exceeds pruning threshold (per @asdacap feedback) - Added cspell ignore rule for command-line arguments pattern (--something-something) to fix cspell warnings - Addresses feedback from @LukaszRozmej, @asdacap, and @flcl42 Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>
Added cspell ignore rule for command-line arguments in commit 298a50e. The pattern |
cspell.json
Outdated
| "/\"[0-9a-fA-F]{64}\"/g", | ||
| "/https?:\\/\\/\\S+/g" | ||
| "/https?:\\/\\/\\S+/g", | ||
| "/--[a-z]+(-[a-z]+)*/gi" |
There was a problem hiding this comment.
@copilot No need in *. All flags are always two words
Changed pattern from /--[a-z]+(-[a-z]+)*/gi to /--[a-z]+-[a-z]+/gi since all CLI flags are always exactly two words (e.g., --pruning-dirtycachemb). Addresses feedback from @flcl42. Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com>
|
Bonus to also recommend increasing CacheDB if new value would cause error on startup if is to low for recommended DirtyCacheMb |
* Add ProgressLogger to trie verification and full pruning operations (#10273) * docs: add implementation plan for ProgressLogger trie visitor integration Addresses #8504 - More use of ProgressLogger Detailed step-by-step plan with: - VisitorProgressTracker class implementation - Unit tests for thread-safety and accuracy - Integration into CopyTreeVisitor and TrieStatsCollector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(trie): add VisitorProgressTracker for path-based progress estimation Addresses #8504 - More use of ProgressLogger - Tracks visited path prefixes at 4 levels (16 to 65536 granularity) - Thread-safe for concurrent traversal - Estimates progress from keyspace position, not node count Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(trie): add unit tests for VisitorProgressTracker Tests cover: - Progress tracking at different levels - Thread-safety with concurrent calls - Monotonically increasing progress - Edge cases (short paths, empty path) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(pruning): integrate VisitorProgressTracker into CopyTreeVisitor Replaces manual every-1M-nodes logging with path-based progress estimation. Progress now shows actual percentage through the keyspace. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Improve TrieStatsCollector progress display - Always enable progress tracking in TrieStatsCollector - Add custom formatter to show node count instead of block speed - Track max reported progress to prevent backwards jumps - Display format: "Trie Verification 12.34% [...] nodes: 1.2M" Fixes progress display issues where: - Progress would jump backwards (12% → 5%) due to granularity switching - Showed confusing "Blk/s" units for trie operations - Displayed "11 / 100 (11.00%)" format that looked odd Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: remove implementation plan documents Implementation is complete, no need for plan docs in the codebase. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Track both state and storage nodes in progress display The node count now includes both state and storage nodes, providing a more accurate representation of total work done. Progress estimation still uses state trie paths only. Changes: - Add _totalWorkDone counter for display (state + storage nodes) - Add isStorage parameter to OnNodeVisited() - Always increment total work, only track state nodes for progress Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Optimize progress tracking with active level and startup delay Improvements: - Add 1 second startup delay before logging to prevent early high values from getting stuck in _maxReportedProgress - Only track the deepest level with >5% coverage (active level) - Stop incrementing counts for shallower levels once deeper level has significant coverage - This ensures progress never shows less than 5% and provides more accurate granularity Technical changes: - Add _activeLevel field to track current deepest significant level - Add _startTime field and skip logging for first second - Only increment seen counts at active level or deeper - Automatically promote to deeper level when >5% coverage reached Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Simplify progress tracking to only use level 3 with leaf estimation Changed to a much simpler approach as requested: - Only track progress at level 3 (4 nibbles = 65536 possible nodes) - For nodes at depth 4: increment count by 1 - For LEAF nodes at shallower depths: estimate coverage - Depth 1: covers 16^3 = 4096 level-3 nodes - Depth 2: covers 16^2 = 256 level-3 nodes - Depth 3: covers 16^1 = 16 level-3 nodes - Non-leaf nodes at shallow depths: don't count (will be covered by deeper nodes) - Keep 1 second startup delay to prevent early high percentages This assumes the top of the tree is dense and provides accurate progress estimation based on actual trie structure. Updated tests to mark nodes as leaves where appropriate. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix full pruning progress tracking - Pass isStorage and isLeaf parameters in CopyTreeVisitor - Storage nodes no longer contribute to state trie progress estimation - Leaf nodes at shallow depths now correctly estimate coverage - Increase startup delay to 5 seconds AND require at least 1% progress - Prevents early high estimates from getting stuck in _maxReportedProgress This fixes the issue where full pruning progress would immediately jump to 100% and not show meaningful progress during the copy operation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Simplify VisitorProgressTracker to single-level tracking Since we only track level 3 (4 nibbles), remove unnecessary array structure: - Replace int[][] _seen with int[] _seen (65536 entries) - Replace int[] _seenCounts with int _seenCount - Replace int[] MaxAtLevel with const int MaxNodes - Rename MaxLevel to Level3Depth for clarity This reduces memory allocation from 70,304 ints (16+256+4096+65536) to just 65,536 ints, and makes the code clearer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unnecessary _seen array from VisitorProgressTracker Since OnNodeVisited is only called once per path, we don't need to track which prefixes we've seen. Just increment _seenCount directly. This eliminates the 65536-int array, reducing memory from 262KB to just a few counters. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove _maxReportedProgress and allow progress to reverse - Remove _maxReportedProgress field and backwards-prevention logic - Report actual progress value even if it goes backwards - Fix path.Length check: only count nodes at exactly Level3Depth - Ignore nodes at depth > Level3Depth for progress calculation - Simplify comment about startup delay Progress should reflect reality, not be artificially constrained. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix lint --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * feat: enable taiko client CI integration tests (#10043) * feat: enable taiko client ci integration tests * fix: gh action structure to run l2_nmc locally * feat: add path for ci-taiko file * Update GitHub Actions checkout reference surge-taiko-mono * fix: remove unused IStateReader from SnapServer (#10282) Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * StateProvider: remove redundant state-root update flag assignment in balance updates (#10268) Update StateProvider.cs * refactor: eliminate delegate allocations in DbOnTheRocks iterator methods (#10209) Update DbOnTheRocks.cs * Warmup threads should not update tx.SpentGas (#10267) Warmup threads do not update tx.SpentGas * Remove mark persisted recursively (#10283) * Remove mark persisted * Whitespace * Test project tests to be split in chunks and run in parallel (#10243) * Test in chunks * Test * Sequential * Test * Simplify * Update OP Superchain chains (#10315) Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com> * Auto-update fast sync settings (#10314) Co-authored-by: rubo <rubo@users.noreply.github.com> * fix(chainspec): add Prague support to spaceneth dev chain (#10316) * fix(chainspec): add maxCodeSize to spaceneth for EIP-3860 * fix(chainspec): add explicit chainID to spaceneth * fix(chainspec): add Prague system contracts to spaceneth genesis * Warn when dirty prune cache is too low (#10143) * Initial plan * Add warning when dirty prune cache is too low When the dirty prune cache is too low, pruning cannot effectively reduce the node cache, causing it to keep re-pruning with little progress. This change adds a warning when the pruning cache size after pruning is more than 80% of its size before pruning, suggesting to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Extract magic number to named constant Extract the 0.8 threshold to PruningEfficiencyWarningThreshold constant for better code readability and maintainability. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Mention --Pruning.DirtyCacheMb argument in warning message Updated the warning message to include the specific command-line argument (--Pruning.DirtyCacheMb) that users can use to increase the pruning cache limit. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Include recommended cache size in warning message Added calculation and display of recommended dirty cache size (current size + 30%) in the warning message to provide users with a concrete value to set. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Update warning threshold to 0.9 and use lowercase argument format Changed PruningEfficiencyWarningThreshold from 0.8 to 0.9 (now warns when retention ratio > 90% instead of > 80%). Updated argument format in warning message from --Pruning.DirtyCacheMb to --pruning-dirtycachemb. Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Optimize warning check and add guard for sparse networks - Move _logger.IsWarn check to outer if statement to skip calculations when logging is disabled - Add minimum threshold (256MB) to prevent false positives on sparse networks with many empty blocks - Addresses code review feedback from @LukaszRozmej Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> * Update src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * Remove 256MB threshold and add cspell ignore for CLI args - Removed 256MB minimum threshold as function is only called when memory exceeds pruning threshold (per @asdacap feedback) - Added cspell ignore rule for command-line arguments pattern (--something-something) to fix cspell warnings - Addresses feedback from @LukaszRozmej, @asdacap, and @flcl42 Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> * Simplify cspell regex to match exactly two-word CLI flags Changed pattern from /--[a-z]+(-[a-z]+)*/gi to /--[a-z]+-[a-z]+/gi since all CLI flags are always exactly two words (e.g., --pruning-dirtycachemb). Addresses feedback from @flcl42. Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> Co-authored-by: Amirul Ashraf <asdacap@gmail.com> Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> * refactor: remove redundant null checks in SnapProviderHelper.AddAccountRange (#10298) Update SnapProviderHelper.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * fix(txpool): remove redundant hasBeenRemoved check in RemoveTransaction (#10319) * fix(sync,trie): Handle timeout exceptions and empty trie sealing in PoW sync (#10307) * fix(sync): Handle OperationCanceledException as timeout in PowForwardHeaderProvider ## Problem PoW chain sync (ETC, etc.) stops completely after a single header request timeout when running in DEBUG mode. The sync stalls with "SyncDispatcher has finished work" even though blocks remain to sync. ## Root Cause Commit cc56a03 ("Reduce exceptions in ZeroProtocolHandlerBase") changed timeout handling from throwing TimeoutException to calling TrySetCanceled(): ```csharp // Before: throw new TimeoutException(...); // After: request.CompletionSource.TrySetCanceled(cancellationToken); ``` This was a performance optimization to reduce exception overhead, but it changed the contract: callers expecting TimeoutException now receive OperationCanceledException (via TaskCanceledException). PowForwardHeaderProvider only caught TimeoutException: ```csharp catch (TimeoutException) { syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader); return null; } ``` The uncaught OperationCanceledException propagates to BlockDownloader which, in DEBUG mode, re-throws it: ```csharp #if DEBUG throw; // DEBUG: propagates, kills sync #else return null; // RELEASE: swallows error, sync continues #endif ``` SyncDispatcher interprets OperationCanceledException as "sync was cancelled" and calls Feed.Finish(), stopping sync permanently. ## The Fix Add a catch for OperationCanceledException with a guard clause: ```csharp catch (OperationCanceledException) when (!cancellation.IsCancellationRequested) { syncPeerPool.ReportWeakPeer(bestPeer, AllocationContexts.ForwardHeader); return null; } ``` The condition `when (!cancellation.IsCancellationRequested)` distinguishes: - Protocol timeout: original token NOT cancelled → handle as weak peer - Real sync cancellation: original token IS cancelled → propagate exception 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(trie): Mark BlockCommitSet as sealed even when root is null BlockCommitSet.IsSealed returned `Root is not null`, which was false for empty state tries where root is null. This caused a Debug.Assert failure in TrieStore.VerifyNewCommitSet when running in Debug mode, as the assertion checked that the previous BlockCommitSet was sealed before starting a new block commit. An empty state trie with Keccak.EmptyTreeHash is valid (e.g., genesis blocks with no allocations). Changed IsSealed to use a separate _isSealed flag that is set when Seal() is called, regardless of whether the root is null. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Apply suggestions from code review --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * fix(Trie): Correct log level check in PrunePersistedNodes (#10310) Update TrieStore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * fix: correct off-by-one in ArrayPoolListCore.RemoveAt (#10306) * fix: correct off-by-one in ArrayPoolListCore.RemoveAt * add test * Update src/Nethermind/Nethermind.Core/Collections/ArrayListCore.cs Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> --------- Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> * Prewarm in groups per sender - simplifies nonce management, can rely on previous state - higher success count * Simplification and cancellation * refactor --------- Co-authored-by: Amirul Ashraf <asdacap@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Anish M Gehlot <35562972+gehlotanish@users.noreply.github.com> Co-authored-by: GarmashAlex <garmasholeksii@gmail.com> Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> Co-authored-by: majtte <139552589+majtte@users.noreply.github.com> Co-authored-by: iPLAY888 <133153661+letmehateu@users.noreply.github.com> Co-authored-by: Damian Orzechowski <114909782+damian-orzechowski@users.noreply.github.com> Co-authored-by: Alexey Osipov <me@flcl.me> Co-authored-by: core-repository-dispatch-app[bot] <173070810+core-repository-dispatch-app[bot]@users.noreply.github.com> Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com> Co-authored-by: rubo <rubo@users.noreply.github.com> Co-authored-by: CPerezz <37264926+CPerezz@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com> Co-authored-by: flcl42 <630501+flcl42@users.noreply.github.com> Co-authored-by: alex017 <dishes-18pole@icloud.com> Co-authored-by: David Klank <155117116+davidjsonn@users.noreply.github.com> Co-authored-by: Diego López León <dieguitoll@gmail.com> Co-authored-by: andrewshab <152420261+andrewshab3@users.noreply.github.com> Co-authored-by: radik878 <radikpadik76@gmail.com>
Changes
When dirty prune cache is undersized, pruning becomes ineffective—repeatedly attempting to prune blocks with minimal memory reduction. This manifests on high-throughput networks (Base, mainnet with increased gas limits) as excessive pruning cycles.
PruningEfficiencyWarningThresholdconstant (0.9) to define retention ratio threshold--Pruning.DirtyCacheMband--pruning-dirtycachembconfiguration arguments, and a recommended cache size (current + 30%) to aid diagnostics--pruning-dirtycachemb) to prevent false positive spell check warningsNote: The function
PersistAndPruneDirtyCache()is only called when the pruning strategy determines memory has exceeded the threshold, so no additional minimum threshold check is needed.Example warning message:
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
Existing test infrastructure covers pruning logic. New code path triggers only when pruning is ineffective (rare in test scenarios).
Documentation
Requires documentation update
Requires explanation in Release Notes
Added warning to help operators identify when pruning cache configuration is insufficient for network load. Warning suggests increasing cache limit using
--Pruning.DirtyCacheMbor--pruning-dirtycachembargument and provides a recommended value (current dirty cache + 30%) when retention exceeds 90% after pruning. The check is optimized to skip calculations when warn-level logging is disabled, preventing unnecessary overhead. Added cspell configuration to ignore two-word kebab-case command-line argument patterns.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.