Performance characteristics of the hash function for trees #55
Replies: 2 comments 1 reply
-
Besides starting this discussion to document the above. I also want to raise a question, above I assumed that hashing is commutative and associative (again, w.r.t security, the final hashes are different, but that is not what I was analyzing). That seems a fair assumption to me, since we rely on that behavior to build Merkle trees. I wonder if we can do the same for hashing of messages, so that a message would be hashed in parallel. AFAIU currently the hashing is done sequentially because we have a dependency in the capacity word, i.e. iteration Edit: There is probably a minimum size for this strategy to start to make sense, see my next comment (this question still stands, is there a reason why not to split a message into |
Beta Was this translation helpful? Give feedback.
-
Another question is: Do we have a faster operation to merge two hash digests? I was careful enough to talk about hash of hashes on the initial discussion, and not hash of a message. The difference is that the former is |
Beta Was this translation helpful? Give feedback.
-
I'm opening this to discuss the runtime behavior of hashing, my goal is to share some observations and document some of the costs associated with it, so that later it will be easier to characterize the runtime costs of the trees and the algorithms using these data structures. I assume that hashing is the dominating cost, other costs like allocation, copies, bit shifting, etc. should be small enough to be ignored in comparison, so I'm focusing on hashing.
The versions below should have the security level:
hash(h1, h2)
andhash(h2, h1)
(order of operands is not important)hash(hash(h1, h2), h3)
andhash(h1, hash(h2, h3))
(associative of operands is not important)Meaning the configurations above have different results, but in terms of security there is no preference between one and the other. (This is what we use when building plain Merkle trees, with no determined order of the leaves)
This means that when we are hashing a sequence of hashes, e.g.
hash(h1, h2, h3, h4)
we can implement it in a few different ways. It could be from left to righthash(hash(hash(h1, h2), h3), h4)
, pair-wisehash(hash(h1, h2), hash(h3, h4))
, or any configuration in-between.All these configurations can be represented as a free tree (connect graph with no cycles), where the leaves are hash inputs (h1, h2, ...) and the root, and inner nodes represent the hashing of values together. For RPO which has a 2-to-1 ratio, every internal node has degree 3 (the two inputs and the one output), this makes the representation an unrooted binary tree. There are many such trees, but they all have one thing in common, there are
n-2
internal nodes. This is to say, regardless of the configuration used, there will always ben-2
hashes performed to compute the root when using RPO.Given that there are many different configurations, but they all have the same number of hash calls, the preferred ordering IMO should be a pair-wise, so that we have a proper tree (every node has either 0 or 2 children) with minimal path length, meaning that most operation can be performed in parallel. Using the example above
hash(h1, h2)
andhash(h3, h4)
can be done in parallel.Beta Was this translation helpful? Give feedback.
All reactions