trie: fix a temporary memory leak in the memcache#17111
Merged
Conversation
fjl
approved these changes
Jul 3, 2018
Contributor
fjl
left a comment
There was a problem hiding this comment.
I don't understand how a node with zero parents can get into the Database, but the fix looks OK. Maybe the root cause is a missing dereference call?
Member
Author
|
The node is already in the database, and the same node is recreated (not loaded). That will result in it being inserted into the memcache again. |
Contributor
|
I'm trying out Streaming chain tracers but get an error on any commit from 67a7857 onwards when this fix was merged: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a 3 liner fix. The rest is just a repro/verification.
We've received from time to time reports that Geth closes with
ERROR[...] Dangling trie nodes after full cleanup. We've been hacking on the memcache for quite a lot, so I'm unsure if the old issues are the same as the one this PR fixes, but this one addesses specifically an issue where certain nodes remain in the memory cache even though they have no more references left.The issue happens when a trie node exists on disk (never loaded or already committed), and it's recreated by the trie again (short node split into full, and them merged back into short). This will cause the node to be inserted into the memcache, with a potentially invalid parent count (0). Dereferencing this trie node form memory will first overflow its counter to MAXINT, thus never cleaning it out.
The node still remains part of the flush-list, so it will eventually be pushed out to disk, but in the mean time it's a dangling node. The PR fixes it by adding an explicit check for the 0-parent case.