Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTXO sums from output MMR (and proof of unspentness) #1733

Closed
antiochp opened this issue Oct 13, 2018 · 7 comments
Closed

UTXO sums from output MMR (and proof of unspentness) #1733

antiochp opened this issue Oct 13, 2018 · 7 comments

Comments

@antiochp
Copy link
Member

antiochp commented Oct 13, 2018

I think we should consider adding "sums" back to our output MMR "hashtree".
We used to have these when we had the original sumtrees but these were TXO sums.
But I think if we are clever about it we can sum the UTXO set (not simply the TXO set) and produce a useful set of peaks and overall root (the sum would be our utxo_sum that we store in the db in our block_sums).

3 files -

pmmr_hash.bin
pmmr_data.bin
pmmr_sums.bin

The hash and data files would act just like they do today, prunable (compactable) via the prune_list, with the leaf_set representing the unpruned leaves.

Note: the leaf_set represents the list of pos that make up the UTXO set in the case of the output MMR.

TXO vs. UTXO -

  • Leaf nodes represent the TXO set.
  • Leaf nodes that are not removed represent the UTXO set.

The sums file would behave very similar to the hash file but each parent node would be the sum of its children.

  • It would be prunable in the same way the hash file is.
  • But it would also be editable in place to support removing leaf nodes while having the sums work as the sum of the UTXO set.

Pruning a leaf node would involve -

  1. Removing the pos from the leaf_set.
  2. Updating the sums file to replace the output commitment with a commitment to zero value and recalculating sums up the path to the peak to adjust the root sum (the UTXO sum).

Compaction would still work - we can remove leaves once both leaves under a parent node are removed, the sums of the parent node would be a commitment to zero and all nodes in the path up to the peak would reflect this.

We would not need to commit to the UTXO set anywhere explicitly.
We commit to the root of the output MMR (TXO set) in each block header.
We also indirectly commit to the UTXO set by committing to the root of kernel MMR in each block header.

A Merkle proof against the current output MMR gives us "proof of inclusion" against the TXO set. We can prove an output exists in the TXO set.
But this does not give us any information about the output being spent or not.

The sums file lets us build a Merkle proof against the sums.
And if spent sums are represented as commitments to zero then I believe we can build a "proof of unspentness" from this.

It would not be a Merkle proof of hashes up to the root hash but a Merkle proof of sums up to the root sum.
[Or do these need to be hashes of the sums, where we can leverage hash_with_index to include position information?]

https://petertodd.org/2016/delayed-txo-commitments

Specifically TXO commitments proposes a Merkle Mountain Range1 (MMR), a type of deterministic, indexable, insertion ordered merkle tree, which allows new items to be cheaply appended to the tree with minimal storage requirements, just log2n “mountain tips”. Once an output is added to the TXO MMR it is never removed; if an output is spent its status is updated in place. Both the state of a specific item in the MMR, as well the validity of changes to items in the MMR, can be proven with log2n sized proofs consisting of a merkle path to the tip of the tree.

The key part here being -

if an output is spent its status is updated in place.

Our unspent/spent status being represented by the output commitment (or commitment to zero).

We effectively do all this currently via the utxo_sum on block_sums in the db.
At each block we know the overall sum of the UTXO set.

But we have no way of using this to prove anything about an individual output.
To check an output is unspent we need to take the full output MMR and the associated leaf_set and sum all the unspent outputs up based on pos from the leaf_set.

And this is one of the reasons full chain validation is expensive - we have to walk the entire set of leaves in the output MMR.
With a sums file we could maintain the sums in an efficient way (the MMR) and quickly calculate the overall UTXO sum at any point in time (the root of the sums tree).

@antiochp
Copy link
Member Author

antiochp commented Oct 13, 2018

And just had a thought - if you can "edit in place" in an MMR then could we not simply add an unspent boolean flag to the entries in the output MMR?

Spending an output would flip the flag true->false.
We would need to rehash up to the associated peak.

Rewind would do the opposite - for all leaf nodes being rewound from spent->unspent we would flip the flag back and rehash up to the peak as necessary.

A Merkle proof could show spent/unspent (you would need the flag to be set correctly for the hash to be correct) alongside its position in the MMR.

I'm thinking about this in terms of "lightweight clients" that may not want to or be able to maintain the full UTXO set locally.

Today we need the full UTXO set locally to know if an output can be spent or not.

@ignopeverell
Copy link
Contributor

That sounds like a good direction to explore. What would need some more thoughts is how to do this fast enough:

Updating the sums file to replace the output commitment with a commitment to zero value and recalculating sums up the path to the peak to adjust the root sum (the UTXO sum).

@antiochp
Copy link
Member Author

antiochp commented Oct 16, 2018

Related - https://www.youtube.com/watch?v=IMzLa9B1_3E&t=3522 (Benedikt Bunz talk on accumulators and UTXO proofs).

Also related - zcash/zcash#2134 (Accumulators vs. Merkle proofs).

@tromp
Copy link
Contributor

tromp commented Oct 17, 2018

Here's a related idea.
Consider the UTXO MMR overlaid on top of the TXO MMR, but with different hashes computed.
Instead of hash_with_index(i, lefthash, righthash) we compute
hash_with_index(i, ALLSPENT, righthash) if the left tree is all spent, and
hash_with_index(i, lefthash, ALLSPENT) if the right tree is all spent.
Here ALLSPENT is a constant 32-byte value, e.g. 0.
Then the root is a simple direct UTXO commitment allowing for utxo Merkle proofs.

@sesam
Copy link
Contributor

sesam commented Oct 17, 2018

Good stuff! Estimates on space/cpu usage? Mobile probably has more space than cpu. 32G slow micro-SD is already cheap.

What's teoretically the lightest possible unspent check? Could mobile wallets keep a partial mmr? Truncated hashes? Probabilistic filter? Bloom/inverted Bloom combo?
The client might pre-check, then download more filter/mmr data from peers, if safe.

@antiochp
Copy link
Member Author

Currently "removing" from the output MMR involves simply flipping a bit in the leaf_set representing the position of a leaf in the MMR.

At compaction time we then use all the removed leaf positions to determine which subtrees (mountains? hills?) can be pruned/compacted based. i.e. if two sibling nodes (leaves or otherwise) are removed then we can prune up to the parent.

We could feasibly modify the "remove" behavior to keep track of updated hashes (all the way up to the peak) in a separate cache of hash overrides.

And then at compaction time actually persist these updates in the MMR itself.

That way we are not constantly doing random writes to the MMR hash file. We batch them all up and apply all updates when we compact the hash file.

@antiochp
Copy link
Member Author

antiochp commented Feb 1, 2019

Closing this for now.

@antiochp antiochp closed this as completed Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants