sync optimization through parallelization #2837

cadmuspeverell · 2019-05-22T21:27:52Z

I think we can all agree downloading the txhashset zip from a slow peer is no fun. I shall hereby propose an alternative that takes advantage of the unique cryptographic structure of Grin’s data, rather than relying on an all-or-nothing generalized algo like BitTorrent. But first some prereqs.

As I understand it, the following data is downloaded during the initial sync process. Let me know if I’m missing anything.

i. block headers
ii. The Big Zip
a. kernels + mmr
b. utxos + mmr + bitset
c. rangeproofs + mmr (prolly not needed) + bitset (hundo p not needed)
iii. recent blocks

In the future when we're all using Satoshi's internet of "no biggie" dvd downloads, bulletproof and kernel sig validation will be the pain point. At the rate crypto changes, I’m sure we’ll have clever ways of easing those bottlenecks when the time comes. BLS sigs already offer the promise of kernel aggregation. I’ve also a few promising ideas for doing smth similar with the utxo set. For now however we must live with the ancient 2017-era cryptography we're dealt with, and of course my cryptoanarchist grandmother whom insists on hosting her full node over a vpn in spite of her 56kbps dialup internet. But I digress...

The only way to speed up IBD without kicking sweet mawmaw off the network is to parallelize the downloads as much as we reasonably can. Let's look at how we can do just that.

Step 1 - headers. This is an obvious one. https://eprint.iacr.org/2019/226.pdf
Step 2 - kernels + mmr. Since kernels aren't pruned, these buggers are easy to download and verify in parallel. You have a recent header thanks to step 1, now simply request batches of like 1000 kernels and require a compressed Ralph proof that they all belong in the mmr.
Step 3 - utxos + mmr + bitset. Finally made it to the fun part. The tricky part is the bitset. There's no proof of work committing to it, so let's change that! Hardfork anyone? GrinCashJedusorsVision 2.0? Or for now we can just request the hash of it from a bunch of peers, and download the one most agreed upon. If it ends up being wrong ban all those jokesters that said it was legit. We'll know very quickly if it was valid.
Step 3.5 - Time to start downloading those utxos. As long as you're only requesting from peers whose bitsets agreed with the majority, you'll both know which peak hashes you need. The syncee can pass those to the syncer along with a batch of bulletproofs and another Ralph proof. Voila. Glossing over a bunch of details here but I want to get this out in the open to get feedback.
Step 4 - rangeproofs and such. Same as utxos, only different. We already know the bitset now.
Step 5 - blocks are already downloaded in parallel. All that remains is...
Step 6 - Profit

Can Zcash or Monero do all of that!? heh didn't think so.

garyyu · 2019-05-23T04:11:22Z

Thanks for starting this sync parallelization optimization topic :-)

I can understand FlyClient for headers, and tx kernels parallelization downloading, i.e. your step 1 and 2. And definitely good optimization direction.

But for UTXO sets and range proofs, sorry I didn't catch your suggestion, could you explain more about the idea?

cadmuspeverell · 2019-05-23T21:17:45Z

Hi Gary. Let me attempt to add clarity. I'll only focus on outputs because rangeproofs follow the same protocol, only difference being that they can reuse the output's bitset. Not sure how much you worked with the Mmr's, so take no offense if anything I say is obvious.

The first thing that needs to happen is the syncing node must determine the bitset, in other words the nonpruned set of leaf indices. Without a hard fork this is the weakest part of the design. The proposal I outlined necessitates communicating with multiple peers to determine said bitset. It's possible for peers to lie about this and you wouldn't catch them until after downloading some or prolly all of the outputs. This would be unfortunate but its still not as bad as downloading an entire txhashset zip from a malicious peer.

The more interesting part is to download the mmr hashes. Once #2813 is merged all peers will be 'compacting' - as the code calls it - at the same time and should consequently have the same exact data in the mmrs, ignoring the most recent couple blocks. Since we have the bitset and we know when they compacted then we have enough data to determine precisely how many hashes there should be in "pmmr_hash.bin". A naive approach at this point could be to just download pmmr_hash.bin from 1 peer - and that may be the first step we take.

But we can do better! Let's insert 16 elements into a Mmr and see where that leaves us.

                                                         30
                                              /                           \
                            14                                                          29
                     /                \                                            /              \
            6                           13                             21                            28
        /         \                  /        \                        /        \                     /           \
    2             5            9             12              17              20           24              27
  /    \        /     \        /     \        /     \           /     \          /     \         /     \          /      \
0      1    3       4    7       8   10       11     15     16    18      19   22     23     25     26

Alright 16 leaves later - we've got ourselves a Mmr with 31 hashes. Let's remove elements 0, 1, 3, 7, 8, and 11 and then "compact". Hashes 2,4,5,6,9,10,and 12-30 remain. Now let's request hashes 0-14 from peer p1 and 15-29 from peer p2. p1 could send us 2,4,5,6,9,10,12,13, and 14 but how can we know those hashes are correct? Easy, they send a Mr. Merkle Proof. This proof just contains the remaining Mmr peaks - in our case just 29 - which is all we need to verify the mmr root.

But we can do even better! Instead of sending 2,4,5,6,9,10,12,13, and 14 - p1 can just send 2,4,5,9,10, and 12 and we can calculate 6,13, and 14 from the other hashes. And all we have to do to request these hashes is ask a peer for hashes under peak 14 and since we already know what has been compacted - thanks to the bitset - they don't have to send us any other metadata.

Finally! We have a bitset and we have our Mmr hashes. We can now just request the outputs in batches of 50, 100, or whatever we deem appropriate. We can verify these outputs against the our Mmr hashes, and if they match, we know they belong. Once we have all of the outputs we can do the mimblewimble magic to verify kernel offsets and output sums and all that fun stuff. If it all checks out then we know everything we received was correct! If the hashes are all correct but it doesn't add up, we ban everyone who sent us the bitset we just tried and we start downloading a different bitset and the Mmr hashes all over again.

Rinse and repeat for rangeproofs which end up being even easier since we can verify against the outputs we already downloaded.

Does this all make sense now?

cadmuspeverell · 2019-05-23T21:19:38Z

Ouch! That diagram is uuuggly. It didn't look that way when I typed it out. sry!

DavidBurkett · 2019-05-24T03:02:39Z

Lol, nice! I'd love to get rid of those annoying zips. I proposed[1], and even implemented[2,3], a similar idea for kernels long ago. My approach allowed us to download kernel batches in parallel, but could only verify them sequentially, similar to the way we do block sync now. A few devs expressed concerns about it, so I scrapped the idea. I don't recall what they were though, or if they would apply to your approach.

I'm still thinking through it all, but 2 things pop out to me:

The leafset. Obviously, that's a fragile approach, since we have no quick way of verifying if a peer is lying to us. And I don't like the idea of growing the header any, since it's already way too large. Maybe we could instead combine the kernel, output, and rangeproof MMR roots and the hash of the bitmap in sort of an MMR of MMRs? Then the header can just commit to the root of the 4 hashes. I'm sure that's a highly controversial suggestion, though.
A merkle proof will confirm that the hashes belong in the MMR, but I don't think that proves the position of those hashes.

[1] #1968
[2] #1987
[3] #2038

antiochp · 2019-05-24T08:06:46Z

There's a bunch of similar/related issues in github tracking a variety of similar and very different approaches to this.

We were tracking something similar here for a while - #952

And kernel specific stuff here - #1219

antiochp · 2019-05-24T08:30:34Z

Quick summary of how I've been thinking about this in the past -

An MMR is implemented as -

a hash file
a data file
(an optional size file)
leafset
prunelist

Each hash file entry is a hash in the MMR.
Each data file entry is the underlying data at a leaf position.
If we have a size file, each entry is the size of the corresponding data file entry in bytes.
The leafset is a bitset of leaf positions that currently exist in the data file (unspent for output MMR)
The prunelist is a bitset of root positions representing subtrees of the MMR that have been fully pruned (i.e. if pos 2 exists in the prunelist then we have pruned/compacted 0 and 1).

To fully rebuild the output MMR we need -

the leafset (this effectively describes the UTXO set)
the data file (each leafset entry maps to an entry in the data file)
the prunelist
hashes for each entry in the prunelist (roots of pruned sub trees)

Everything else can be rebuilt locally from the data above.

The leafset, prunelist and datafile all exist on disk on the sending side. It is cheap to send them across like this and this data is all required.

We currently send the full (pruned/compacted) hash file across for simplicity.
We can optimize this by only sending across hashes for entries in the prunelist (roots of pruned subtrees). This would reduce the number of hashes required similar to what you described above.

Data for leaves (leafset + datafile) and hashes for pruned subtrees (prunelist + hashes) => rebuild MMR.

cadmuspeverell · 2019-05-27T05:22:53Z

Thx for your comments everyone!

The leafset. Obviously, that's a fragile approach, since we have no quick way of verifying if a peer is lying to us.

I don't disagree @DavidBurkett. This is why I propose checking with multiple peers first to make sure they're in agreement. In practice the majority will surely agree on the correct bitsets. As the worst a malicious peer could do is delay the sync process, this concern is mostly only of academic interest.

And I don't like the idea of growing the header any, since it's already way too large. Maybe we could instead combine the kernel, output, and rangeproof MMR roots and the hash of the bitmap in sort of an MMR of MMRs? Then the header can just commit to the root of the 4 hashes.

Not a hundo p convinced yet but I do tend to agree and after thinking a bit on this, decided to plagiarize your idea - minus the bitmaps - #2740 (comment). This should neither affect FlyClient as we can just include proof of the previous header root when it's needed in header sync.

A merkle proof will confirm that the hashes belong in the MMR, but I don't think that proves the position of those hashes.

Uh-huh but they would have to also align with the leafset and prunelist. Even so, in a very rare scenario a peer could in theory manage to convince you that the hashes belong in a different position than they actually do. Of course it would fail verification shortly afterward. If we wanted to punish such a peer we could identify them fairly easily a number of different ways.

To fully rebuild the output MMR we need -

the leafset (this effectively describes the UTXO set)
the data file (each leafset entry maps to an entry in the data file)
the prunelist
hashes for each entry in the prunelist (roots of pruned sub trees)

You're entirely correct @antiochp. My description used an oversimplification of the pruning process. In reality anywhere I said bitset or leafset, I really was referring to the leafset and prunelist combined. Adjusting for that my proposal provides a way to get all of that information, and as such gives the node enough info to rebuild and verify everything.

The leafset, prunelist and datafile all exist on disk on the sending side. It is cheap to send them across like this and this data is all required.

If this were truly cheap as you say then we would not need to have this conversation. In truth this is quite the opposite of reality. Just consider the case of the rangeproofs wherewith minimal daily usage already Grin requires nearly 100mB to store the fully pruned and compacted pmmr_data.bin. Good luck downloading that bugger from mawmaw!

We can optimize this by only sending across hashes for entries in the prunelist (roots of pruned subtrees).

Indeed we could. This would save us some downloading but would delay our verification of the mmr root if using my scheme. It's for sure a trade-off worth discussing!

Edit: Where do we go from here? I'm gathering that numerous people have numerous different incompatible ideas about how best to improve IBD. Is this something the council has the final say on or is there a community vote or smth?

quentinlesceller · 2020-05-11T14:33:13Z

Closing for mimblewimble/grin-rfcs#29 and #3289.

cadmuspeverell mentioned this issue May 26, 2019

Bittorrent support for txhashset zip download #2740

Closed

hashmap added the research label Jul 3, 2019

cadmuspeverell mentioned this issue Nov 27, 2019

Enable faster sync mimblewimble/grin-rfcs#29

Merged

quentinlesceller closed this as completed May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync optimization through parallelization #2837

sync optimization through parallelization #2837

cadmuspeverell commented May 22, 2019

garyyu commented May 23, 2019

cadmuspeverell commented May 23, 2019 •

edited

Loading

cadmuspeverell commented May 23, 2019

DavidBurkett commented May 24, 2019

antiochp commented May 24, 2019 •

edited

Loading

antiochp commented May 24, 2019

cadmuspeverell commented May 27, 2019 •

edited

Loading

quentinlesceller commented May 11, 2020

sync optimization through parallelization #2837

sync optimization through parallelization #2837

Comments

cadmuspeverell commented May 22, 2019

garyyu commented May 23, 2019

cadmuspeverell commented May 23, 2019 • edited Loading

cadmuspeverell commented May 23, 2019

DavidBurkett commented May 24, 2019

antiochp commented May 24, 2019 • edited Loading

antiochp commented May 24, 2019

cadmuspeverell commented May 27, 2019 • edited Loading

quentinlesceller commented May 11, 2020

cadmuspeverell commented May 23, 2019 •

edited

Loading

antiochp commented May 24, 2019 •

edited

Loading

cadmuspeverell commented May 27, 2019 •

edited

Loading