Performance characteristics of the hash function for trees #55

hackaugusto · 2023-02-13T16:28:52Z

hackaugusto
Feb 13, 2023

I'm opening this to discuss the runtime behavior of hashing, my goal is to share some observations and document some of the costs associated with it, so that later it will be easier to characterize the runtime costs of the trees and the algorithms using these data structures. I assume that hashing is the dominating cost, other costs like allocation, copies, bit shifting, etc. should be small enough to be ignored in comparison, so I'm focusing on hashing.

The versions below should have the security level:

hash(h1, h2) and hash(h2, h1) (order of operands is not important)
hash(hash(h1, h2), h3) and hash(h1, hash(h2, h3)) (associative of operands is not important)

Meaning the configurations above have different results, but in terms of security there is no preference between one and the other. (This is what we use when building plain Merkle trees, with no determined order of the leaves)

This means that when we are hashing a sequence of hashes, e.g. hash(h1, h2, h3, h4) we can implement it in a few different ways. It could be from left to right hash(hash(hash(h1, h2), h3), h4), pair-wise hash(hash(h1, h2), hash(h3, h4)), or any configuration in-between.

All these configurations can be represented as a free tree (connect graph with no cycles), where the leaves are hash inputs (h1, h2, ...) and the root, and inner nodes represent the hashing of values together. For RPO which has a 2-to-1 ratio, every internal node has degree 3 (the two inputs and the one output), this makes the representation an unrooted binary tree. There are many such trees, but they all have one thing in common, there are n-2 internal nodes. This is to say, regardless of the configuration used, there will always be n-2 hashes performed to compute the root when using RPO.

Given that there are many different configurations, but they all have the same number of hash calls, the preferred ordering IMO should be a pair-wise, so that we have a proper tree (every node has either 0 or 2 children) with minimal path length, meaning that most operation can be performed in parallel. Using the example above hash(h1, h2) and hash(h3, h4) can be done in parallel.

hackaugusto · 2023-02-13T16:33:56Z

hackaugusto
Feb 13, 2023
Author

Besides starting this discussion to document the above. I also want to raise a question, above I assumed that hashing is commutative and associative (again, w.r.t security, the final hashes are different, but that is not what I was analyzing). That seems a fair assumption to me, since we rely on that behavior to build Merkle trees. I wonder if we can do the same for hashing of messages, so that a message would be hashed in parallel.

AFAIU currently the hashing is done sequentially because we have a dependency in the capacity word, i.e. iteration n uses the value of the capacity from n-1 to proceed, which means hashing has to be done from left-to-right, i.e. sequentially. My question is what prevents this hashing to be done in parallel, using the same assumption we have for the Merkle tree, basically looking at every message as if it was a Merkle tree.

Edit: There is probably a minimum size for this strategy to start to make sense, see my next comment (this question still stands, is there a reason why not to split a message into n parts to be hashes individually and aggregated as a Merkle tree?)

0 replies

hackaugusto · 2023-02-13T16:49:13Z

hackaugusto
Feb 13, 2023
Author

Another question is: Do we have a faster operation to merge two hash digests? I was careful enough to talk about hash of hashes on the initial discussion, and not hash of a message. The difference is that the former is hash(h1, h2, h3, h4) is computing pair-wise hashing, which is different from hash(h1 || h2 || h3 || h4), which is hashing a single message. The former requires n-1 hash calls, each doing a single iteration, so n-1 permutations. The later requires a single hash call, but it does n/2 iterations / permutations. In number of permutations it is clearly better to perform a single hash call instead of multiple, unless there is a faster way to merge hash digests.

1 reply

bobbinth Feb 14, 2023
Maintainer

The way to think about this is in number of permutations of a hash function. In the RPO context, we have a sponge construction which can absorb 8 field elements per permutation. After we've absorbed all the elements we want, we can "squeeze out" the result. Squeezing out 4 elements does not require an extra permutation and basically comes for free.

When we need to hash up to 8 elements, we need to apply only one permutation. Thus, performing 2-to-1 hash (i.e., hashing 2 digests) takes exactly one permutation. If we want to absorb 16 elements (4 digests) we can do this in 2 permutations, 24 elements in 3 permutations etc.

Thus, if we want to compute a commitment to $n$ elements, the most efficient way to do this is to absorb all elements into the sponge and then squeeze out the result. The number of permutations needed for this is $\frac{n}{8}$. However, proving that some element is in the commitment would be quite costly - we might have to compute the whole hash for that.

That approach which is most efficient from the standpoint of someone who wants to verify if some data is in the commitment is the binary tree. The number of permutations needed to build, as you've noted is $\frac{n}{4} - 1$. So, basically twice as costly as the most efficient approach.

By increasing the arity of the tree, we can achieve different tradeoff between prover and verify cost. Generally, the higher the arity the lower is the prover cost (to compute the commitment), but this comes at the expense of the verifier cost (for verifying inclusion).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance characteristics of the hash function for trees #55

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Performance characteristics of the hash function for trees #55

hackaugusto Feb 13, 2023

Replies: 2 comments · 1 reply

hackaugusto Feb 13, 2023 Author

hackaugusto Feb 13, 2023 Author

bobbinth Feb 14, 2023 Maintainer

hackaugusto
Feb 13, 2023

Replies: 2 comments 1 reply

hackaugusto
Feb 13, 2023
Author

hackaugusto
Feb 13, 2023
Author

bobbinth Feb 14, 2023
Maintainer