Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor the export traversal order as pre-order #662

Closed
wants to merge 10 commits into from

Conversation

cool-develope
Copy link
Collaborator

@cool-develope cool-develope commented Jan 17, 2023

ref: #608, #656

We are using post-order traversal in the Export, but it violates adr-001, not being able to make a path from ExportNode.
It refactored Export/Import to provide both traversal orders pre-order and post-order

@cool-develope cool-develope requested a review from a team as a code owner January 17, 2023 20:54
@cool-develope
Copy link
Collaborator Author

@yihuang , sorry I referred your PR (#656), there is no way to contribute to your PR.
Please review it.

@yihuang
Copy link
Collaborator

yihuang commented Jan 18, 2023

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 18, 2023

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

I can see some significant advantages of the path, one thing is parallel restricting of the tree because assigning path of left child tree and right child tree would be independent.
The other thing is we can skip child node keys for the same version. I believe it will reduce the entire storage.
Those are the advantages of pre-order iterating.

@yihuang
Copy link
Collaborator

yihuang commented Jan 18, 2023

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

I can see some significant advantages of the path, one thing is parallel restricting of the tree because assigning path of left child tree and right child tree would be independent.

can you elaborate on this?

The other thing is we can skip child node keys for the same version. I believe it will reduce the entire storage. Those are the advantages of pre-order iterating.

I think that'd some insignificant save if there's any at all:

  • the version field can always be saved if you like, no matter what nonce strategy used.
  • using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.
  • for most of the nodes only one of the children's key is saved.
  • the saved bytes (i'd say at most 2 bytes on average if any at all based on above reasoning) is insignificant compared to the hash field(32bytes) and key field(usually dozens of bytes).

@cool-develope
Copy link
Collaborator Author

can you elaborate on this?

OK, for example when commit the branch, we are trying to assign the path for new nodes with iterating the tree. we can do this iterating and assigning parallelly, there would be more use cases

the version field can always be saved if you like, no matter what nonce strategy used.

it's not true, now we have leftNodeKey and rightNodeKey instead of leftHash and rightHash, we will save the whole child node key

using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.

version is uint64 and it requires 8bytes, I think 8bytes is enough for path, 8bytes = 64bit (64 height in tree)

@yihuang
Copy link
Collaborator

yihuang commented Jan 18, 2023

can you elaborate on this?

OK, for example when commit the branch, we are trying to assign the path for new nodes with iterating the tree. we can do this iterating and assigning parallelly, there would be more use cases

again, iteration and assigning nonce is a trivial part during committing branch, especially with sequential nonce assignment, computing hash is the heavy one, if it can parallelize hash computation, that'd be very useful.

the version field can always be saved if you like, no matter what nonce strategy used.

it's not true, now we have leftNodeKey and rightNodeKey instead of leftHash and rightHash, we will save the whole child node key

node key is just a tuple (version, nonce), right? I mean compared with other nonce assignment strategy, no matter what nonce strategy we choose, the version field can always saved, decode a special empty value as the same version number as the parent node.

using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.

version is uint64 and it requires 8bytes, I think 8bytes is enough for path, 8bytes = 64bit (64 height in tree)

I assume we do variable length integer encoding here, so larger integers take more space on average.

I can also point out some advantages of sequential nonce compared with path:

  • the nodes in a version can stored in a continuous array: version -> [node, node, ...], the nonce is used as the array index.
  • works with any traversal order, because the exact nonce assignment is only a local decision, not a consensus in network.
  • it's just so much simpler.

@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 18, 2023

again, iteration and assigning nonce is a trivial part during committing branch, computing hash is the heavy one, if it can parallelize hash computation, that'd be very useful.

right, I think we can parallelize hash calc with assigning the path (sequence id is impossible)

regarding (version, nonce), I think there would be a problem in encoding/decoding of nodes if we use empty methods for version

anyhow, even using the sequence id as a nonce, post-order would be a problem in export/import. and I think it is not a good place for this topic.

@cool-develope
Copy link
Collaborator Author

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

@yihuang
Copy link
Collaborator

yihuang commented Jan 18, 2023

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

@yihuang
Copy link
Collaborator

yihuang commented Jan 18, 2023

anyhow, even using the sequence id as a nonce, post-order would be a problem in export/import. and I think it is not a good place for this topic.

in post-order, we just assign nonce in post order, technically the nonce only need to be kept unique within the version, right?

@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 18, 2023

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

what is your plan to migrate to new version?

my idea is we can restrict the storage using export/import, there is a no way of soft landing

@yihuang
Copy link
Collaborator

yihuang commented Jan 19, 2023

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

what is your plan to migrate to new version?

my idea is we can restrict the storage using export/import, there is a no way of soft landing

I'm striving for a non-consensus breaking version that just do storage optimization, the first step is versiondb running alongside with existing iavl tree, the second step should be optimize iavl tree itself, I think there are lots of potential already before introducing breaking stuff.

@yihuang
Copy link
Collaborator

yihuang commented Jan 19, 2023

right, I think we can parallelize hash calc with assigning the path (sequence id is impossible)

I think parallel hash computation is a very interesting topic, the difficult part is most of the time we create branches rather than full sub-trees. A naïve implementation could easily end up slower than a sequential one, because the potential for parallel is low. But I don't see why we can't do that with sequential nonce, we can do sequential iteration while distributing the task of hash.
I'll research parallel hash computation more, it'll be useful in our change set verification step.

@kocubinski kocubinski self-assigned this Jan 19, 2023
@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 19, 2023

@yihuang
OK, I will close this PR after reflecting the idea in the node-key refactoring branch, let's discuss this further in the next storage meeting.
BTW, could you review #646 ? It's not a consensus-breaking.

@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 23, 2023

@yihuang , I just remembered why the version + nonce is not working in the Import.
There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

@yihuang
Copy link
Collaborator

yihuang commented Jan 23, 2023

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

@cool-develope
Copy link
Collaborator Author

cool-develope commented Jan 23, 2023

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

I plan to build the current node path based on the parent one even it is inherited from the different version.
Of course, we can use the global unique nonce using big.Int but it would be expensive

@yihuang
Copy link
Collaborator

yihuang commented Jan 24, 2023

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

I plan to build the current node path based on the parent one even it is inherited from the different version.

Isn't that breaks assumption of path design?

@cool-develope
Copy link
Collaborator Author

Isn't that breaks assumption of path design?

yeah, it could not be exactly the same, but it doesn't affect any logic of adr.

@cool-develope
Copy link
Collaborator Author

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

@yihuang
Copy link
Collaborator

yihuang commented Jan 26, 2023

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

there's format version field in state sync snapshot, will the new node key format support both format? if that's the case, then it don't breaks anything.

@cool-develope
Copy link
Collaborator Author

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

there's format version field in state sync snapshot, will the new node key format support both format? if that's the case, then it don't breaks anything.

I have no exact idea how to interact within cosmos-sdk (with snapshot format), but that's true, the iavl will provide both post-order, pre-order, how about this @tac0turtle ?

@cool-develope
Copy link
Collaborator Author

@yihuang , done please review again.
@tac0turtle , do we need to update the cosmos/store?

@tac0turtle
Copy link
Member

We will update store, when this version is released in alpha/beta.

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this pr is the new node key refactor works with both post and pre order import, if so do we need both? I could be lacking the understanding the need for both if the node key refactor will be merged

export.go Show resolved Hide resolved
tac0turtle
tac0turtle previously approved these changes Jan 27, 2023
Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left one request for a godoc, after that lets merge this

@cool-develope
Copy link
Collaborator Author

@kocubinski @yihuang
I updated it to provide both pre-order and post-order, please review it!

@tac0turtle tac0turtle dismissed their stale review January 31, 2023 10:44

dismissing the approval as the implementation landed

@@ -155,8 +155,8 @@ func (t *ImmutableTree) Hash() ([]byte, error) {

// Export returns an iterator that exports tree nodes as ExportNodes. These nodes can be
// imported with MutableTree.Import() to recreate an identical tree.
func (t *ImmutableTree) Export() (*Exporter, error) {
return newExporter(t)
func (t *ImmutableTree) Export(traverseOrder OrderType) (*Exporter, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an API breaking change here? Maybe adding a new method ExportPreOrder would be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds great

mutable_tree.go Outdated Show resolved Hide resolved
@cool-develope cool-develope requested review from kocubinski, yihuang and tac0turtle and removed request for yihuang January 31, 2023 15:36
import.go Outdated
@@ -63,7 +102,7 @@ func (i *Importer) Close() {
}

// Add adds an ExportNode to the import. ExportNodes must be added in the order returned by
// Exporter, i.e. depth-first post-order (LRN). Nodes are periodically flushed to the database,
// Exporter, i.e. depth-first pre-order (NLR). Nodes are periodically flushed to the database,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit misleading to me. We could have post- or pre-ordered nodes. Let's be explicit that the caller must choose to import in the same order as was exported.

node.leftHash = node.leftNode.hash
node.rightNode = i.stack[stackSize-1]
node.rightHash = node.rightNode.hash
case stackSize >= 1 && i.stack[stackSize-1].subtreeHeight < node.subtreeHeight:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this branch go? If post-order export didn't change why are we now handling import differently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is related to #656, the inner nodes always have two children

@@ -83,82 +122,57 @@ func (i *Importer) Add(exportNode *ExportNode) error {
version: exportNode.Version,
subtreeHeight: exportNode.Height,
}

if node.subtreeHeight == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if node.subtreeHeight == 0 {
// set leaf nodes subtree size = 1
if node.subtreeHeight == 0 {

@@ -321,7 +321,7 @@ func (node *Node) validate() error {
if node.value != nil {
return errors.New("value must be nil for non-leaf node")
}
if node.leftHash == nil && node.rightHash == nil {
if node.leftHash == nil || node.rightHash == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this change, it's more restrictive than previous, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type OrderType int

// OrderTraverse is the type of traversal order to use when exporting and importing.
// PreOrder is needed for the new node-key refactoring. The default is PostOrder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought new node-key refactoring works with both orderings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the pre-order is needed for node-key refactoring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly confused now. This pr adds support for both pre and post order, but node key refactor require pre-order. If a node exports using post-order then we cant import it into the new version correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's why provides both orders

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even it is the original version, we can request a pre-order snapshot, then import it into the new version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason post-order is chosen in the first place is the node hash is updated in a post-order way, you need to update the children first to update the parent node, will pre-order import need more temporary memory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, literally post-order more makes sense, both way requires stack to keep the current path, you are right pre-order will require 2 times memory, but the stack length is at most the height of the tree, it is so trivial

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tac0turtle , the old version can use any order, the new version should use pre-order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the stack length is at most the height of the tree, it is so trivial

yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.

I think this is what we should discuss, tbh I have no clear idea which way is more efficient

@cool-develope
Copy link
Collaborator Author

@yihuang #662 (comment)

@yihuang
Copy link
Collaborator

yihuang commented Feb 1, 2023

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

@cool-develope
Copy link
Collaborator Author

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

It might lead to several problems in the implementation, int32 is enough in the tree operation, but it will be int64 or big.Int in export/import

@yihuang
Copy link
Collaborator

yihuang commented Feb 1, 2023

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

It might lead to several problems in the implementation, int32 is enough in the tree operation, but it will be int64 or big.Int in export/import

A continuously increasing number is at least smaller than the path which is also a unique integer between the nodes, right?

@yihuang
Copy link
Collaborator

yihuang commented Feb 1, 2023

There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

But in practice, there are only millions or tens of millions of versions, a continuous array of 10millions of int32s is only: (10000000 * 4) / 1024/1024 = 38.15 megabytes, should be practical I think.

@cool-develope
Copy link
Collaborator Author

But in practice, there are only millions or tens of millions of versions, a continuous array of 10millions of int32s is only: (10000000 * 4) / 1024/1024 = 38.15 megabytes, should be practical I think.

That makes sense, it will be at most hundreds MG, I will implement version + local nonce, let's do some benchmarks

@tac0turtle
Copy link
Member

closing this as we dont need pre-order reconstruction. We discussed this yesterday on the storage working group call

@tac0turtle tac0turtle closed this Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants