Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block validation flow v2 + Batch (serial) sig verification #2250

Merged
merged 19 commits into from
Jan 25, 2021

Conversation

mratsim
Copy link
Contributor

@mratsim mratsim commented Jan 20, 2021

Overview

This reworks the block validation flow to introduce SigVerified blocks besides the existing blocks and trusted blocks.

This mention but does not introduce TransitionVerified blocks as without generics or static enum that would lead to lots of duplication.

The flow matrix is the following

Signature unchecked Signature verified
State transition unchecked - UntrustedBeaconBlock - SigVerifiedBeaconBlock
- UntrustedIndexedAttestation - TrustedIndexedAttestation
- UntrustedAttestation - TrustedAttestation
---------------------------- ------------------------------- -----------------------------
State transition verified - TransitionVerifiedBeaconBlock - TrustedSignedBeaconBlock
- UntrustedIndexedAttestation - TrustedIndexedAttestation
- UntrustedAttestation - TrustedAttestation

Notice that at the moment TrustedAttestation is imprecise and actually means SigVerifiedAttestation.

The current code checks cryptography then state meaning SigVerifiedAttestation is a state that only exists between those lines of code:

if not batchVerify(sigs, quarantine.sigVerifCache):
return err((ValidationResult.Reject, Invalid))
static: doAssert sizeof(SignedBeaconBlock) == sizeof(SigVerifiedSignedBeaconBlock)
let sigVerifBlock = cast[SigVerifiedSignedBeaconBlock](signedBlock)
let (valRes, blockErr) = addRawBlockCheckStateTransition(dag, quarantine, sigVerifBlock, cache)

Implementation

This reimplements the branch https://github.com/status-im/nimbus-eth2/tree/block-validation-flow which is stuck due to issues mentioned in #2219. In particular, besides workarounds needed to avoid compiler ICE, the branch is stuck trying to deal with generic types in macro nim-lang/RFCs#44

Switching to multithreading

Switching to multithreading requires changing this line

# TODO: For now only enable serial batch verification
return batchVerifySerial(cache, sigs, secureRandomBytes)

to batchVerify and compile with -d:openmp.

beacon_chain/spec/signatures_batch.nim Outdated Show resolved Hide resolved
beacon_chain/spec/signatures_batch.nim Outdated Show resolved Hide resolved

### TransitionVerifiedBeaconBlocks

A block that passes the state transition checks and can be successfully applied to the beacon chain is considered `TransitionVerified`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the state transition checks all signatures (unless it's a sigverified), so how can this type of block happen? I think in all cases we want to run signature checks before or as part of state transition, but never after (as running state transition on blocks that haven't passed sigver is fishy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to us. In the current state it can't happen because I do crypto verif first hence why TransitionVerified isn't materialized in the code.

That said currently before this PR, within a state_transition epoch/block we did all the checks first and crypto verification last because it is much more expensive. This is an option that we can keep in mind, depending on which worst case scenario we want: either faulty or malicious parties use valid crypto and fake blocks and in that case we want to verify state transition first, or they use valid blocks and invalid crypto and in that case we want to verify crypto first.

beacon_chain/spec/signatures_batch.nim Outdated Show resolved Hide resolved
beacon_chain/spec/signatures_batch.nim Show resolved Hide resolved
return err((ValidationResult.Reject, Invalid))

static: doAssert sizeof(SignedBeaconBlock) == sizeof(SigVerifiedSignedBeaconBlock)
let sigVerifBlock = cast[SigVerifiedSignedBeaconBlock](signedBlock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this certainly looks fishy... I'm surprised it even works with the List types and Trusted vs normal bls values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's AFAIK the simplest way to convert. I changed TrustedSig so that it's a "distinct" type from normal Signatures (just an object with the exact same fields) and has the same byte representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's a distinct type, SigVerifiedBeaconBlock(signedBlock) should work - without the cast - I'm a bit afraid the cast might do bad things to the gc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, looked it up, it's not a distinct type, just a similar type.. so when assigning something to something else, the gc needs new refcounts for seqs - I'm not sure that will happen with cast - check the generated C code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, looking a bit more at this, the codegen does some extremely inefficient things for this cast - it will generate a temporary union type with the two types as members, then genericAssign twice, once for each type, in and out of the union.. it's "correct:ish" but extremely wasteful.

here's something better:

var sigVerifBlock: SigVerifiedSignedBeaconBlock
assign(sigVerifiedBlock, cast[ptr SigVerifiedSignedBeaconBlock](signedBlock)[])

using stew/assign - this "should" work as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think casting through pointer in the function call will produce no copy (because verified blocks are never used where they are produced, they are passed to another consumer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, yeah - the only thing I'm thinking about is strict pointer aliasing - the nim impl is inefficient but correct (with stew/assign it would be decently efficient as well but...)..

hm, not sure what the right thing here is tbh - if I look at vtune, copying blocks is one of the highest cpu consumers apart from crypto because of how extremely slow genericAssign is.. we could do the pointer thing which represents a small risk of strict aliasing issues, but we're not looping here or anything such, so perhaps it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New codegen

#line 235 "/home/beta/Programming/Status/nim-beacon-chain/beacon_chain/block_pools/clearance.nim"
	T29_ = addRawBlockCheckStateTransition__ln6s2fp4XrmWohuHu9bOV8A(dag, quarantine, (&(*((tyObject_SigVerifiedSignedBeaconBlock__FeS1XMPF6R63Ue3RhifLCg*) (signedBlock)))), (&cache));	valRes = T29_.Field0;	blockErr = T29_.Field1;

// [...]

#line 242 "/home/beta/Programming/Status/nim-beacon-chain/beacon_chain/block_pools/clearance.nim"
	addResolvedBlock__kpy9cmxed3iYehH784xNCUA(dag, quarantine, (&(*(*dag)).clearanceState), (&(*((tyObject_TrustedSignedBeaconBlock__hdN6xn3RN9aAfpVMBC35UEw*) (signedBlock)))), parent, (&cache), onBlockAdded);

beacon_chain/block_pools/clearance.nim Outdated Show resolved Hide resolved
mratsim added a commit that referenced this pull request Jan 21, 2021
mratsim added a commit that referenced this pull request Jan 21, 2021
mratsim added a commit that referenced this pull request Jan 21, 2021
mratsim added a commit that referenced this pull request Jan 21, 2021
@mratsim mratsim closed this Jan 21, 2021
@mratsim mratsim reopened this Jan 21, 2021
@mratsim
Copy link
Contributor Author

mratsim commented Jan 21, 2021

#2255 was needed to unbreak unstable.

I think all concerns are addressed for this PR.

@@ -304,12 +304,16 @@ func toBeaconBlockSummary(v: SomeBeaconBlock): BeaconBlockSummary =
parent_root: v.parent_root,
)

# TODO: we should only store TrustedSignedBeaconBlock in the DB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new asTrusted where memory layout is the same for all block kinds, this should actually be a simple change

…e pubkey validations is now done first in signatures_batch)
@mratsim
Copy link
Contributor Author

mratsim commented Jan 25, 2021

The PR has been rebased and should be good to merge.

Following #2259 I modified signatures_batch.nim. Previously since signatures and pubkeys were verified at deserialization it could assume that the keys were correct and avoid if foo.isNone(): return false else: foo.unsafeGet() dance. Now signatures_batch.nim assumes that anything can go rogue.

@zah zah merged commit 70a0365 into unstable Jan 25, 2021
@zah zah deleted the block-validation-flow-v2 branch January 25, 2021 18:45
dryajov pushed a commit that referenced this pull request Jan 28, 2021
* bump nim-blscurve

* Outline the block validation flow

* introduce the SigVerified types, pass the tests

* Split clearance/quarantine to prepare for batch crypto verif

* Add a batch signature collector

* Make clearance use SigVerified block and split verification between crypto and state transition

* Always use signedBeaconBlock for the onBlockAdded callback

* RANDAO signing_root is the epoch instead of the full block

* Support skipping BLS for testing

* Fix compilation of the validator client

* Try to fix strange errors MacOS and Jenkins (Clang, unknown type name br_hmac_drbg_context in stdlib_assertions.nim.c)

* address #2250 (comment)

* address #2250 (comment)

* onBlockAdded callback should use TrustedSignedBeaconBlock #2250 (comment)

* address #2250 (comment)

* Use the application RNG: #2250 (comment)

* Improve codegen of conversion zero-cost)

* Quick fixes with loadWithCache after #2259 (TODO: graceful error since pubkey validations is now done first in signatures_batch)

* Graceful handle rogue pubkeys and signatures now that those are lazy-loaded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants