Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMMR segment creation and validation #3453

Merged
merged 35 commits into from
Nov 17, 2020

Conversation

jaspervdm
Copy link
Contributor

@jaspervdm jaspervdm commented Sep 27, 2020

This PR implements a (P)MMR "segment", defined by this WIP RFC. In short, a segment is a set of 2**b (with variable b) consecutive leaves along with the necessary data to reconstruct the subtree root and data to verify membership of the subtree in the original MMR (a Merkle proof). In case of a prunable MMR, the segment only contains the unpruned leaves in the segment range and also contains intermediary MMR hashes that are necessary to construct the segment root.

Concretely, a segment consist of the following elements:

  • Segment identifier: b, the 2-log of the size of the number of leaves (before pruning) and a zero-based segment index idx
  • List of intermediary hashes, sorted by MMR position in ascending order. Only for prunable MMRs, it contains the unpruned hashes in the segment range. They are necessary to reconstruct the segment root. As of this PR it contains all unpruned hashes above the leaves, which possibly contains redundant data. This is something we can improve over time.
  • List of unpruned leaves in the segment (leaf index range [i*2**b, (i+1)*2**b)), sorted by MMR position in ascending order.
  • Segment merkle proof, required to reproduce the MMR root starting with the segment root, thereby proving membership.

Given that a segment contains a number of leaves that is a power of 2, a full segment forms a full subtree in the MMR and as such it has a single root. The final segment possibly has less than 2**b elements. In this case the peaks in the segment are also peaks in the full MMR and we define its root as these peaks bagged together.

Segment creation
A segment is created by looping over all the MMR positions in the segment range. If the position is an unpruned leaf or its subling is an unpruned leaf, add it to the leaf data list. If not, check if its hash is unpruned and add it to the list of hashes. Next, generate the merkle proof by filling a list of hashes with:

  1. the siblings along the path from the subtree root (final MMR position of the segment) up to its corresponding peak in the MMR
  2. peaks to the right of our subtree root, bagged together to a single hash
  3. peaks to the left (from right to left) with a position smaller than the first MMR position of the segment

Note that this procedure will also behave as expected for the partially filled final segment, since in that case step and 1 and 2 will not produce any hashes and step 3 will give us all the other peaks in the MMR.

Segment verification
Iterate over all the MMR positions in the segment range. If the position is a leaf: check (for prunable MMRs) to see if the element is expected to be there (based on presence of the element or its sibling in the bitmap) and if it is get it from the list of leaves, hash it and store the hash in a list of temporary hashes.
For all other positions: if both children are not present in the list of temporary hashes (i.e. they are both pruned), do nothing. If either or both hashes are present in the list, hash them together and store the new hash in the list of temporary hashes. If only one of the children was present, obtain the hash of the other child from the list of hashes in the segment instead.
After looping through all positions, we are either left with 1 entry (full segment) or multiple entries (partially filled, final segment) in the list of temporary hashes. If there are multiple, bag them together. We are left with the segment root.

Next, verify the proof by attempting to reproduce the MMR root hash: first, hash together with the siblings along the path from the subtree root to its peak in the MMR, then hash it together with the bagged peaks to the right and finally hash it together with the peaks on the left (going right to left). Verification passes if the calculated hash is equal to the MMR root hash.

Open points before merging

  • In case of a heavily pruned MMR, is this data always sufficient to reconstruct the full pruned MMR? Or would we miss any intermediary hashes? Yes, now that we added support for fully pruned segments.
  • I think we need a function to extract the intermediary and proof hashes for purposes of storing them in the MMR we are building TBD in a future PR
  • Would it be more natural to pass in a bitmap indicating the spent or the unspent positions? Unspent
  • Is bitmap/leaf index 0-based (should be a quick check) Yes, it is.

@jaspervdm jaspervdm changed the title [WIP] PMMR chunk generation and validation PMMR segment generation and validation Oct 6, 2020
@jaspervdm jaspervdm changed the title PMMR segment generation and validation PMMR segment creation and validation Oct 6, 2020
@jaspervdm jaspervdm requested a review from antiochp October 7, 2020 15:14
@antiochp
Copy link
Member

Quick question that occurred to me reading over the PR description (nice description by the way!) -

Segment identifier: b, the 2-log of the size of the number of leaves (before pruning) and a zero-based segment index idx

Does it potentially make more sense to provide a "starting leaf index" as segment identifier? The current "segment index" is heavily dependent on the "segment size". Clearly you can translate between these easily enough but it may make sense to identify these based on both number of leaves via b and leaf idx.

@antiochp
Copy link
Member

List of unpruned leaves in the segment (leaf index range [i*2b, (i+1)*2b)), sorted by MMR position in ascending order.

Is the intention to have these segments be self-contained? For pruned MMR (outputs and rangeproofs) do we also provide the bitmap or is the assumption we already have the corresponding bitmap segment?

@antiochp
Copy link
Member

(apologies for the barrage of questionable feedback here...) 😄

@jaspervdm
Copy link
Contributor Author

Not at all! I think they highlight some of the subtleties in the PR so it is good that they are discussed explicitly.

@antiochp
Copy link
Member

antiochp commented Oct 28, 2020

Just posting here for reference -

  • We discussed "empty" segments. Proposal is to return a single root (found by recursing up the MMR from the empty segment). This provides all necessary hashes for reconstruction, even across a heavily pruned MMR, with pruning going beyond segment size. The root position will exist outside the defined segment.

  • There is an optimization in the above, if the requester can specify the segment size/height. Given the output bitmap they can determine which segments are empty and request larger segments (output and rangeproof) that cover multiple empty segments (or combination of empty and adjacent non-empty segments) of the MMR.

@jaspervdm
Copy link
Contributor Author

jaspervdm commented Oct 29, 2020

Fixed an edge case: if there is an uneven number of leaves, and the final leaf is spent, we still require it to be present in the final segment. Previously we only checked the bitmap for it and its (non-existent) sibling, which are both 0. This led us to assume they were pruned, but this is not the case.

This was actually caught by the pruning test in store/tests/segment.rs, except for the fact that there was a bug in the test itself related to the bitmap indices. Fixing the bug in the test made the test fail as it should have.

@jaspervdm
Copy link
Contributor Author

@antiochp

  • We discussed "empty" segments. Proposal is to return a single root (found by recursing up the MMR from the empty segment). This provides all necessary hashes for reconstruction, even across a heavily pruned MMR, with pruning going beyond segment size. The root position will exist outside the defined segment.

We now support pruned segments. The root() function now returns Option<Hash>, where a None indicates a full segment that is completely pruned. In this situation the root (or one of its parents) needs to be obtained from the list of hashes in the segment. This is done in first_unpruned_parent().

In order to find the first parent up the path to the peak that is unpruned, the new first_unpruned_parent() function can be used. If the segment is not fully pruned, it will return (hash, None) where the hash is the root of the segment. If the segment is fully pruned, it will return (hash, Some(pos)) where the hash is the hash of the first parent that isn't compacted away, and pos is the corresponding position.

I've also added a bunch of tests to make sure it behaves as expected for full segments and doesn't affect the partially filled final segment.

@antiochp
Copy link
Member

antiochp commented Nov 3, 2020

👍 Sounds good - I'm planning to take a closer look at this today.

@antiochp
Copy link
Member

antiochp commented Nov 9, 2020

This looks good. Want to move it out of Draft status?

Just one minor point/question -

In order to find the first parent up the path to the peak that is unpruned, the new first_unpruned_parent() function can be used. If the segment is not fully pruned, it will return (hash, None) where the hash is the root of the segment. If the segment is fully pruned, it will return (hash, Some(pos)) where the hash is the hash of the first parent that isn't compacted away, and pos is the corresponding position.

Do we need to have None vs Some(pos) here for this?
I wonder if we could simply do (hash, pos) consistently for both scenarios? If its the "real" root of the subtree then its just the pos of the root. If its a fully pruned subtree then pos is just a higher up parent pos. Is there an advantage to only including an optional pos?

@jaspervdm
Copy link
Contributor Author

You are right, we don't really need to return a (hash, Some(pos)), will update it to (hash, pos). For conceptual clarity I'd like to keep the Option<Hash> on the root() function though.

@jaspervdm jaspervdm marked this pull request as ready for review November 10, 2020 12:10
Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're looking good here with this. 👍
What else is outstanding before we can merge?

@jaspervdm
Copy link
Contributor Author

I'm working on the deser of bitmap segments in this PR, but it probably makes more sense to do that in a separate one. I think we can merge this.

@jaspervdm jaspervdm merged commit 8faba4e into mimblewimble:master Nov 17, 2020
@antiochp
Copy link
Member

🎉

@jaspervdm jaspervdm deleted the pmmr_chunk branch November 26, 2020 17:58
@antiochp antiochp mentioned this pull request Nov 26, 2020
@yeastplume yeastplume mentioned this pull request Feb 22, 2022
26 tasks
bayk added a commit to mwcproject/mwc-node that referenced this pull request Jun 8, 2024
)

                 * Chunk generation and validation

                 * Rename chunk -> segment

                 * Missed a few

                 * Generate and validate merkle proof

                 * Fix bugs in generation and validation

                 * Add test for unprunable MMR of various sizes

                 * Add missing docs

                 * Remove unused functions

                 * Remove segment error variant on chain error type

                 * Simplify calculation by using a Vec instead of HashMap

                 * Use vectors in segment definition

                 * Compare subtree root during tests

                 * Add test of segments for a prunable mmr

                 * Remove assertion

                 * Only send intermediary hashes for prunable MMRs

                 * Get hash from file directly

                 * Require both leaves if one of them is not pruned

                 * More pruning tests

                 * Add segment (de)serialization

                 * Require sorted vectors in segment deser

                 * Store pos and data separately in segment

                 * Rename log_size -> height

                 * Fix bitmap index in root calculation

                 * Add validation function for output (bitmap) MMRs

                 * Remove left over debug statements

                 * Fix test

                 * Edge case: final segment with uneven number of leaves

                 * Use last_pos instead of segment_last_pos

                 * Simplify pruning in test

                 * Add leaf and hash iterators

                 * Support fully pruned segments

                 * Drop backend before deleting dir in pruned_segment test

                 * Simplify output of first_unpruned_parent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants