-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement Flyclient by Benedikt Bunz, Lucianna Kiffer, Loi Luu and Mahdi Zamani's #1555
Comments
To complement, this could have 2 applications:
|
Marked this for T4 just for the header |
@tromp @ignopeverell Starting to look a bit closer at this (been thinking about it in the background for a bit as well). Our current MMRs (output, rangeproofs and kernel) all implement a "data" file and a "hash" file. Are we thinking of doing something similar here? We have the block headers in the db/index, so we do not necessarily need a data file? The only issue I see with that approach is headers are not a "fixed" size, specifically the PoW. So -
|
I thought bulletproofs are all 674 bytes? For the purpose of verifying accumulated difficulty, we need the sipkeys for each pow. So conceptually, having the data be all headers rather than just all pow is quite attractive. The header MMR would be the one MMR to rule them all. |
We do this to define the Lines 381 to 385 in 11f2d7b
I was assuming this was because they are potentially of varying lengths. How much variation is there in the size of pow (solution?) in our headers? |
That depends on the choice of 2nd PoW on mainnet obviously, but if we have both |
Actually - its probably not too complex to have a modified To append data to this backend -
We will basically have the existing |
Note that "hash the header" is already done together with the pos (hash_with_index). |
Yeah - I just mean we need to know the pos to index the header against, based on where we insert the hash into the hash file. |
The |
I carried out some various refactoring to make the header processing (both sync and during block propagation) a bit easier to reason about. I spiked (a couple of different times) trying to wire a basic "header MMR" into the processing pipeline. Each time I came up against roadblocks getting it to work in practice. Putting down some thoughts here around what I believe to be the complexity here - Blocks vs. Headers
Headers are different though.
The critical point here is we need to support two competing header chains at any given time.
So coming to the conclusion that we cannot do a "Header MMR" without actually maintaining two MMRs internally. Going to explore this a bit further but I'm pretty sure we cannot do this with a single header MMR. |
Replacing prev_hash by a header MMR presents obvious difficulties to syncing, as one needs the frontier of a block's header MMR to verify the next block's header MMR. So when learning of a higher cumdiff block, all one can do is check their PoW, ask for the preceding header (whose id we cannot know), and repeat until it merges onto the history we do have. Only at that point can we go forward and fully check the branch, including block downloads. Once done we would switch head to sync head. |
I'm curious why you think either of these are a problem. We already do 2, as we will pick the most work header which could be advertised by a single peer. There is no way around that. And the way you address the DoS is ban that peer as soon as you realize it lied (see #1139). Regarding 1, rewind should be cheap and headers aren't too large to re-load and hash, so it should be a really minor cost in sync overall (and generally null due to 2). |
There is still a danger of DoS attack if we believe a higher cumdiff with very low current diff. |
I guess what I was trying to say is that referencing the MMR root in itself doesn't increase those risks. |
I don't think I explained my concerns very well earlier, so let me try and describe the scenario that's bothering me - For simplicity we assume height is a reasonable proxy for cumulative difficulty.
Our node then does the following -
The peer uses the locator to identify an approximate fork point and sends back headers on its chain from that point. Lets assume for simplicity the peers sends back headers Our node then processes these new headers.
The issue is we're actually tracking two chains at this point -
The header MMR will only be able to track on of these in its persisted state. Presumably it tracks Back to the example - our node now builds a new locator and asks the peer for more headers.
This "rewind and reapply" is not cheap (relatively speaking, compared to normal header sync). Our "streaming headers" works against us here as well.
And this gets really expensive with a long fork. This is not an issue currently because we track enough state for the two chains, with our Once we introduce state that makes the chain explicit (i.e. the header MMR which represents a single chain), then "rewind and reapply" to flip-flop between the two chains gets expensive for longer forks. Does that make sense? |
It does make sense but it doesn't seem too hard to optimize (for a relative version of hard). Here are a few ideas, which you probably already thought of but maybe not in conjunction:
To put this in perspective, consider kernels. A block ultimately could reasonably have a few thousand of those. Even a short fork of a few blocks would need to rewind and reapply 10k or 20k elements arranged in larger batches (blocks). So we do need to be fast at doing that regardless. |
Awesome! Two things: Please also credit my coauthors Lucianna Kiffer, Loi Luu and Mahdi Zamani. And we are working hard to have the paper up as soon as possible. That will hopefully create some clarity. We have a new updated analysis that gives you the optimal sampling strategy and even takes into account changing difficulty. I guess the most important thing is that the MMR (https://eprint.iacr.org/2015/718.pdf) structure remains the same and is the only thing that needs to be implemented in the blockchain. Everything else, like how the proof exactly is constructed is purely a client side change. |
Also I think that Beam has also been working on an implementation. Not sure what the politics are but perhaps it's possible to work together? |
excuse my ignorance, but what is a locator and how does it allow identification of an approximate fork point? Regarding "have to rewind and reapply for each batch of 8 headers", why is that quadratic behaviour needed? why not rewind 1000 headers and apply 1005 headers the first time? Am I right in thinking that rewinding an MMR n steps takes time linear in n? |
Same idea as in bitcoin. In short, a node sends the header hashes it knows 2^n back from the head up to genesis. The receiving node identifies the common headers and sends what's after that (on a fork or not).
Just an idiosyncracy of parsing headers from the network, just happens to be fastest if we read them by batches of 8. But as I mentioned, we can definitely introduce larger batches when we know there's more handling involved.
Rewind itself is mostly constant. We just jump back to the fork point in one go. That's another nice thing with MMRs. There's a caveat around spent bitmaps that need to be reconstructed however, which is linear. |
See #1726 for concrete tasks for testnet4. |
FlyClient (in loose terms) allows us to validate the full header chain by randomly sampling a small subset of headers and providing Merkle proofs that those headers are included in the full header chain (based on the previous header_root committed to in the header). Can we then do something similar for the kernel MMR? Can we randomly sample a small subset of kernels and do the same thing? i.e.
So we prove that n headers are included in the current header chain. Does this allow us avoid verifying the full kernel sum for the full header chain during fast sync? |
If a random sample of PoWs that the latest block commits to are valid, then you know that the cumulative difficulty is approximately correct. There is no similar guarantee for a sample of kernels. You need to verify every single kernel to know that the current UTXO set is correct. Knowing that 99.999% of kernels is correct is not enough. |
Shame. |
Latest version of flyclient paper at https://eprint.iacr.org/2019/226 |
replacing the prev_hash by the MMR root of all previous block hashes, as presented in
https://www.youtube.com/watch?time_continue=8404&v=BPNs9EVxWrA
The text was updated successfully, but these errors were encountered: