-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050
Conversation
NodeIterator is modified to receive *node.Node instead of *trie.MTrie.
Replace very large Go map holding all unique nodes with smaller map of each subtrie to: - reduce operational memory by 37GB - reduce allocs by 19.6 million (50% of serialization allocs) - reduce duration by 2.7 minutes
@@ -115,7 +114,7 @@ func (i *NodeIterator) Next() bool { | |||
// initial call to Next() for a non-empty trie | |||
i.dig(i.unprocessedRoot) | |||
i.unprocessedRoot = nil | |||
return true | |||
return len(i.stack) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we change this? It seems like an important change, but I don't see why we would
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug fix for a problem that didn't surface yet because of the way node iterator was used.
The bug is unique node iterator's Next()
returns true when i.unprocessedRoot
is visited already and i.stack
is empty.
This bug doesn't happen when we iterate nodes of an entire trie (root nodes are always unique). In this PR, we iterate nodes of subtries and subtries can be shared and visited already. So instead of always returning true
assuming there's at least one unique node when digging i.unprocessedRoot
, we only return true when there are unique nodes in the internal stack after calling dig(i.unprocessedRoot)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! It does make sense.
Would it be possible to maybe add a test catching this particular bug, and showing how the fix helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to maybe add a test catching this particular bug, and showing how the fix helps?
Test for this bug is already in iterator_test.go#L269-L396.
The test iterates 3 left substries and 3 right subtries (some subtries are shared). The test verifies that:
- order of iterated nodes is descendents first
- shared subtries/nodes are not iterated twice
- non-nil node is returned (meaning as long as
Next()
returns trueValue()
returns a non-nil node)
@@ -30,7 +30,7 @@ func TestPopulatedTrie(t *testing.T) { | |||
emptyTrie := trie.NewEmptyMTrie() | |||
|
|||
// key: 0000... | |||
p1 := utils.PathByUint8(1) | |||
p1 := utils.PathByUint8(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the only place we change it in a test - is this value irrelevant or internal working has changed somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this value irrelevant or internal working has changed somehow?
Yes, this value is irrelevant and internal working hasn't changed.
The intent is to use p1
as a left leaf node and p2
as a right leaf node of the same parent.
Given p2
path is 0100 0000
created using utils.PathByUint8(64)
, p1
's path can be either of those two paths:
0000 0000
created usingutils.PathByUint8(0)
0000 0001
created usingutils.PathByUint8(1)
I changed the p1 path to utils.PathByUint8(0)
to be consistent with its comment // key: 0000...
, which doesn't change the intention of the test.
Looks good overall and really smart idea! |
@@ -76,13 +75,13 @@ type NodeIterator struct { | |||
// as for each node, the children have been previously encountered. | |||
// NodeIterator created by NewNodeIterator is safe for concurrent use | |||
// because visitedNodes is always nil in this case. | |||
func NewNodeIterator(mTrie *trie.MTrie) *NodeIterator { | |||
func NewNodeIterator(n *node.Node) *NodeIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing this node iterator to be a proper node iterator.
@m4ksio yeah, that was my thought initially too but other aspects like memory savings from preallocations are huge (for very large maps). Preallocation saves a lot of memory even with same map size of 1 million elements:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Codecov Report
@@ Coverage Diff @@
## master #3050 +/- ##
==========================================
+ Coverage 54.43% 54.47% +0.04%
==========================================
Files 722 722
Lines 66839 66910 +71
==========================================
+ Hits 36383 36449 +66
- Misses 27401 27405 +4
- Partials 3055 3056 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Updated text to mention testing because another test for this PR was conducted and mentioned today (August 31, 2022). |
Primary goal is to reduce operational RAM in checkpoint v5. Secondary goals include speeding up checkpointing and redesign to simplify concurrency in the next PR.
UPDATE: 🚀 Full checkpointing v5 finishes in 12-13 minutes on EN4 and reduced peak memory use more than expected. This PR was merged on Aug 23 and deployed to EN4.mainnet19 on Oct 7, 2022.
This PR replaces largest data structure used for checkpoint serialization. During serialization, this change processes subtries instead of entire tries at once. Changes also focused on preallocations to increase memory savings.
Serializing data in parallel is made easier (because this PR splits mtrie into multiple subtries), but adding parallelism is outside the scope of this PR. Issue #3075 should be used to determine if parallelism is worthwhile (at this time) before implementing it because parallelism has tradeoffs such as consuming more RAM, etc.
Closes #2964
Updates #1744
Updates #3075
Preliminary Results Using Level 4 (16 Subtries)
Using August 12 mainnet checkpoint file:
top
command), -23GB RAM (go bench B/op)Root is at Level 0.
Benchmark used Go 1.18.5 on
benchnet-dev-004
.No benchstat comparisons yet (n=5+) due to duration and memory required.
Tests
This PR passed unit tests and round-trip tests before it was merged to master on August 23, 2022:
b2sum
of 150GB files matched).NOTE: As of Sept 13, 2022 this PR has not been merged to mainnet.
EDIT: Added more details after reading PR review questions.
Clarified root is at level 0 and we're using level 4 (16 subtries).
Mentioned tests, including round-trip tests on Aug 21 that passed before merging PR to master on Aug 23.
Mention issue #3075 to replace "issue will be opened" about adding parallelism made easier by this PR.
Make it more clear this PR is not deployed yet to mainnet.