Investigate UTXO obsolescence #763

ignopeverell · 2018-03-09T19:15:10Z

We have these nice MMRs lying around to help secure our chain and we haven't put them to too much use yet. Here's a short outline of what could be a proposal:

After a year, nodes can remove UTXO data (commitment, switch, range proof).
To spend them, the Merkle proof needs to be provided, along with the commitment (and potentially switch).
To compensate for that larger transaction size, additional fees may apply.

There are multiple benefits to UTXO obsolescence:

Limits the impact of spam attacks.
Provides an incentive for wallets to move UTXOs regularly and consolidate them (given that it'd be cheaper to do so than to spend after obsolescence).
The UTXOs actually still exist, so HODLers aren't meaningfully penalized (the additional fees would still assumingly be negligible in that scenario)

antiochp · 2018-03-12T16:35:21Z

@ignopeverell How do we know an output is unspent if all we have is a Merkle proof and the commitment?
I saw a commitment to a UTXO bitset mentioned on the ML/gitter - is this involved here?

For coinbase outputs today we can use the Merkle proof to prove how "old" the output is - but nothing about spent/unspent. We need to look in the MMR for this.
But if the UTXO gets pruned from the MMR as described here...

I guess the same question but asked differently - how do we prevent double spends if nodes have pruned the year old UTXO out of the MMR?

ignopeverell · 2018-03-13T00:09:44Z

Sorry I hadn't fully thought this through when I filed this, it was more a placeholder to refine it some more. But I think there are a couple possible approaches.

We only remove the UTXO data. One doesn't even need a Merkle proof to do spend afterward, all that's required are the feature field, the commitment and switch hash.
We fully prune. This would require a Merkle proof and some additional proof of unspentness. If we were to add a UTXO bitset, the node would already have such a proof ready so wouldn't require it.

Does that make more sense?

antiochp · 2018-03-13T11:34:25Z

Yeah thanks for that.
I think the benefits of having what would effectively be an upper limit on the size of the output MMR makes something like this really compelling.

My only hesitation is some of these options mean we end up with two alternative ways of handling spending outputs (two ways to validate it, two ways to build them, edge cases where they potentially interact etc.) and the complexity goes up.
But it feels like the benefits make it worth it.

Being able to say the output MMR on a pruned node is capped at x and everything still works would be awesome.
And that pruning works efficiently regardless of old legacy outputs and its not just an idealized scenario that never actually happens in practice.

0xmichalis · 2019-04-22T14:45:56Z

Wallets would have to store the obsolete UTXO data they care about and that data cannot be restored in case of loss, right? Wallets would have to warn when they get to handle obsolete data and automation could be provided to self-send UTXOs periodically (but ideally it should be possible to opt-out).

More benefits to such a scheme:

enhanced network privacy since more transactions are incentivized
right to be forgotten can actually be exercised
presumably more lightweight chain

jaspervdm · 2019-04-25T14:00:52Z

Some nodes still would need to keep the UTXO and rangeproof information around, since anyone joining the network needs to check the rangeproofs. So introducing this feature would split the nodes in 2 types: archival and non-archival. Right now all nodes are archival nodes, in the sense that anyone can do a full state sync from any node.

0xmichalis · 2019-04-25T14:37:48Z

@jaspervdm wouldn't it suffice for those missing proofs to be validated when their owners try to spend them? My understanding is that the roots of the MMRs that track all UTXO data are still kept in the blockchain so new nodes should be able to validate transactions that try to spend obsolete UTXOs if they are presented with valid Merkle proofs. Why wouldn't that work?

jaspervdm · 2019-04-26T00:22:14Z

If somewhere in the past someone managed to create two outputs: one of -10 grin and another of +10 grin, they can now freely spend the +10 grin output. The -10 output would just never get spent

0xmichalis · 2019-04-26T16:33:40Z

they can now freely spend the +10 grin output. The -10 output would just never get spent

Wouldn't one be able to spend the +10 output anyway as it stands today?

antiochp · 2019-04-26T17:49:28Z

Wouldn't one be able to spend the +10 output anyway as it stands today?

The -10 would not have a valid rangeproof associated with it so neither of -10/+10 could be created.

Today every node validates every rangeproof at creation time.
If we only verified the rangeproof at "spend" time, then you would just never spend the -10.

0xmichalis · 2019-04-27T12:57:24Z

The -10 would not have a valid rangeproof associated with it so neither of -10/+10 could be created.

What would it take for an attacker to add such an output in the chain? Would it be enough to mine it in a block even with a minority of the hashrate? Or would the invalid output invalidate the block, hence rational/honest miners wouldn't mine on it? In that case, in order for an atacker to obsolete a -10/+10 tx, they would have to control 51% of the hashrate over the course of a year (obsolesence time - creation time), right?

Today every node validates every rangeproof at creation time.

I don't think we would stop validating at creation in any case.

antiochp · 2019-04-28T10:01:39Z

I don't think we would stop validating at creation in any case.

The issue here is you cannot actually guarantee this with a large re-org.

Say we are at height 1,000 and the "fast sync" horizon is 100.
All nodes will have full blocks back to height 900.

If a re-org occurs beyond the horizon, say to 895' then nodes will -

process headers from 895' to the new re-org chain tip (say 1000' for simplicity)
ask for txhashset (UTXO set) at 900' (the horizon on the re-org chain)
process full blocks from 901' to 1000'

But note: We do not see the full blocks between [895', .., 899'] (between the point the re-org occurred and our fast sync horizon).
We do not validate any txs that appear in those blocks.
We do validate (during fast sync for height 900' that the UTXO set is consistent with the kernel sums. We just don't care what happened prior to this point.
And it is safe to "not care" because we know that every output in the UTXO set is valid (it has a valid rangeproof). If the UTXO set contained a +10/-10 it would not validate successfully.

Realtime validators validate txs at creation time.
Historical validators validate the txhashset and a subset of recent full blocks.
A large re-org can force a node into a historical validator role -
- txs prior to the horizon cannot be validated
- we can only validate outputs in the resulting UTXO set

antiochp · 2019-04-29T10:15:17Z

Crossposting this here for discussion - https://www.grin-forum.org/t/aggregate-merkle-proofs/4948

0xmichalis · 2019-04-29T14:01:22Z

@antiochp going back to your comment re deep re-orgs beyond horizon, I am not sure how that's relevant to utxo obsolesence. Presumably, in order for a deep re-org to be a problem with obsolete utxos, it would have to be deeper than (obsolescence time - creation time). What am I missing?

antiochp · 2019-05-30T08:46:35Z

Presumably, in order for a deep re-org to be a problem with obsolete utxos, it would have to be deeper than (obsolescence time - creation time). What am I missing?

I don't think you're missing anything. It would mean treating the obsolescence threshold as an implicit "checkpoint" though I think.

Right now we can (at least in theory) reorg to any height all the way back to genesis.
Not saying the chain would survive a reorg like that but the "checkpoints or no checkpoints" is a fundamental decision around which is the "real chain". Is it the chain with most work or is it the chain with most work post some arbitrary checkpoint?

If we reorg past the obsolescence threshold then we reorg into "obsolete outputs" territory and not every node has the ability to validate the UTXO set at that point.

@jaspervdm Every lightweight node could track a "sum of obsolete UTXOs". This would allow the sums to sum correctly w.r.t. the kernels. This would not solve the missing rangeproof problem though. But spending an "obsolete" output requires both a Merkle proof of its place in the outputs MMR and its associated rangeproof.

For an output to be in the obsolete set it must have originally been in the "non-obsolete" UTXO set so must have had its rangeproof verified at creation time.

Is there still a case where a large enough reorg could introduce obsolete outputs where the rangeproof has never been verified?
Malicious miner produces a massive reorg, generating an obsolete -1,000,000 output with an offsetting +1,000,000 and associated rangeproof.
The -1,000,000 would simply never get spent and no node would be any the wiser.

antiochp · 2019-05-30T08:50:25Z

The conversation yesterday about "old and new style bulletproofs" got me thinking about "obsolete outputs".
If we obsolete them after a certain threshold then this would potentially give us natural cutoff points for introducing compatibility breaking changes between old and new outputs (or rangeproofs).

We could do something along the lines of the "turnstyle" in ZCash - where pre-turnstyle outputs are simply not spendable until they pass through the turnstyle. In our case obsolete outputs would need to be refreshed (and converted into new style bulletproofs or whatever was required).

We would not necessarily need to support both old and new in all regular txs. The conversion could be a specific process (spend one old one and replace with a new one for example).

ZCash identifies inflation this way be forcing everything through non-shielded addresses to pass the turnstyle. We can't do this so we still have the problem of nodes not knowing if the set of obsolete outputs has caused inflation or not (the -ve output with missing rangeproof example above).

antiochp · 2019-06-04T10:42:48Z

Just to add to what @jaspervdm was saying earlier.

Even if we split the UTXO set into a "current" and an "obsolete" populations (presumably based on some cutoff or threshold) - all nodes still need to verify all rangeproofs.
If there is even a single obsolete rangeproof that is not verified then nodes joining the network cannot be sure we do not have a -10/+10 situation. Every other rangeproof may be valid in isolation, but this is not sufficient.

So even if we support obsolete UTXOs we still need to include all rangeproofs to cover every unspent output as part of the fast sync (IBD).

But - the only time we use rangeproofs is when an output in the UTXO set is verified, either during fast sync or during tx and block propagation.
Node could potentially discard rangeproofs once they are verified.
We would need some subset of nodes to maintain rangeproofs for the full UTXO set so new nodes could sync from them.

Given all this it does feel like there may be more value in working toward supporting some kind of lightweight node (that discards rangeproofs after verification).
Every node needs to verify all UTXO rangeproofs but not every node needs to maintain them once verified.

This would potentially save a significant amount of disk space locally. We could discard rangeproofs and be smart about pruning the rangeproof MMR (we still want to be able to verify the MMR root for each header).

This would not reduce the amount of data necessary for sync but I don't think we can do this anyway, even with old "obsolete" UTXOs.

Edit:

At height ~200,000 we have the following rangeproof data in our MMR.
This is for a UTXO size of ~44,000 (out of ~1,200,000 total outputs)

root@grin-ams-01:/grin/node_mainnet# ls -alh chain_data/txhashset/rangeproof/
total 108M
drwxr-xr-x 2 root root 4.0K Jun  4 10:50 .
drwxr-xr-x 5 root root 4.0K May 22 12:42 ..
-rw-r--r-- 1 root root  93M Jun  4 10:50 pmmr_data.bin
-rw-r--r-- 1 root root  15M Jun  4 10:50 pmmr_hash.bin
-rw-r--r-- 1 root root  84K Jun  4 10:50 pmmr_leaf.bin
-rw-r--r-- 1 root root 243K Jun  2 01:00 pmmr_prun.bin

Discarding rangeproofs on a lightweight node would save 93MB of data and most of 15MB of the hash file.
This is approx 50% of the fast sync txhashset downloaded from a peer.

antiochp · 2020-05-12T16:15:30Z

I'm going to close this for now. I suspect the need for all nodes to validate all unspent rangeproofs is going to effectively negate any benefits gained from this.

We know about it and we can always revisit exploration at some future point.

ignopeverell added enhancement question labels Mar 9, 2018

ignopeverell added this to the Mainnet milestone Mar 9, 2018

lehnberg added the research label May 30, 2019

quentinlesceller mentioned this issue May 12, 2020

Agenda: Development, May 12 2020 mimblewimble/grin-pm#288

Closed

2 tasks

antiochp closed this as completed May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate UTXO obsolescence #763

Investigate UTXO obsolescence #763

ignopeverell commented Mar 9, 2018

antiochp commented Mar 12, 2018 •

edited

Loading

ignopeverell commented Mar 13, 2018

antiochp commented Mar 13, 2018

0xmichalis commented Apr 22, 2019 •

edited

Loading

jaspervdm commented Apr 25, 2019

0xmichalis commented Apr 25, 2019

jaspervdm commented Apr 26, 2019

0xmichalis commented Apr 26, 2019

antiochp commented Apr 26, 2019 •

edited

Loading

0xmichalis commented Apr 27, 2019

antiochp commented Apr 28, 2019

antiochp commented Apr 29, 2019

0xmichalis commented Apr 29, 2019 •

edited

Loading

antiochp commented May 30, 2019

antiochp commented May 30, 2019 •

edited

Loading

antiochp commented Jun 4, 2019 •

edited

Loading

antiochp commented May 12, 2020

Investigate UTXO obsolescence #763

Investigate UTXO obsolescence #763

Comments

ignopeverell commented Mar 9, 2018

antiochp commented Mar 12, 2018 • edited Loading

ignopeverell commented Mar 13, 2018

antiochp commented Mar 13, 2018

0xmichalis commented Apr 22, 2019 • edited Loading

jaspervdm commented Apr 25, 2019

0xmichalis commented Apr 25, 2019

jaspervdm commented Apr 26, 2019

0xmichalis commented Apr 26, 2019

antiochp commented Apr 26, 2019 • edited Loading

0xmichalis commented Apr 27, 2019

antiochp commented Apr 28, 2019

antiochp commented Apr 29, 2019

0xmichalis commented Apr 29, 2019 • edited Loading

antiochp commented May 30, 2019

antiochp commented May 30, 2019 • edited Loading

antiochp commented Jun 4, 2019 • edited Loading

antiochp commented May 12, 2020

antiochp commented Mar 12, 2018 •

edited

Loading

0xmichalis commented Apr 22, 2019 •

edited

Loading

antiochp commented Apr 26, 2019 •

edited

Loading

0xmichalis commented Apr 29, 2019 •

edited

Loading

antiochp commented May 30, 2019 •

edited

Loading

antiochp commented Jun 4, 2019 •

edited

Loading