-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy aggregated batch verification #3212
Conversation
A novel optimisation for attestation and sync committee message validation: when batching, we look for signatures of the same message and aggregate these before batch-validating: this results in 10-20% fewer signature verifications on a busy server, leading to a significant reduction in CPU usage. * increase batch size slightly which helps finding more aggregates * add metrics for batch verification efficency * use simple `blsVerify` when there is only one signature to verify in the batch, avoiding the RNG
@@ -74,7 +89,7 @@ const | |||
# (RNG for blinding and Final Exponentiation) | |||
# are amortized, | |||
# but not too big as we need to redo checks one-by-one if one failed. | |||
BatchedCryptoSize = 16 | |||
BatchedCryptoSize = 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there measurements on how often failures necessitating backtracking/retrying the batching occur, which would become more acute with larger batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only time I've seen them is when we had a bug in the signature verification function - by and large, during an attack, each node will locally discard any faulty signatures so they do not spread - that means that only during a targeted attack can there be a failure here, against an individual node
This shows a mainnet node at batchSize = 24, 32 and 72 respectively.
One issue with the way that validation is set up is that in libp2p, the "next" gossip message from a single node is not processed until the current one has been validated - this decreases concurrency, and therefore also the ability to build meaningful batches - this is specially problematic for lazy aggregation because one node is likely to be sending multiple messages for the same data when dealing with an attestation burst: all attestations on a single subnet will typically have the same message being signed, and nodes listen to particular subnets - cc @Menduist @dryajov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I prefer constructing the SignatureSet in-place in the sequence because returning with init
an extra-large datatypes (and in this case a tuple) tends to create very inefficient code in Nim with lots of memset(0) and here probably large stack allocation/reclaim.
Nitpick: This high batch number might need to be a compile-time const for minimal-sized testnets? If the committees/attestations never reach 72, we would always have a high 30ms delay.
pendingBuffer: seq[SignatureSet] | ||
resultsBuffer: seq[Future[BatchResult]] | ||
sigsets: seq[SignatureSet] | ||
items: seq[BatchItem] | ||
|
||
BatchCrypto* = object | ||
# Each batch is bounded by BatchedCryptoSize (16) which was chosen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment need to be updated to 72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'm playing around with different values to strike a good balance - it turns out that in practice, on a really heavily loaded server the average batch size on mainnet, after applying the libp2p fix to validate messages concurrently:
- max = 72 gives 28-30 signatures per batch and 2.4-2.6 signatures per aggregate
- max = 36 gives 17-18 signatures per batch and 1.7-1.9 signatures per aggregate
This suggests that some batches are large, but most are much smaller than the max - basically, when max is large, we occasionally use the given space, but not always - it's also likely that the additional space gives aggregation a better chance simply because we collect signatures from different subnets (and it is only per subnet that aggregation can happen).
An earlier version tried using a separate batcher per subnet explicitly, but that doesn't work very well at all - there's not enough traffic on a single subnet to consistently fill up the batches, so overall, it lowers the efficiency of batching.
high 30ms delay.
The "maximum" delay when the batch is not full remains at 10ms, or whenever the async loop wakes up. I suspect increasing this timeout would improve batching and aggregation, but it would indeed hurt the case you're describing.
72 vs 36, mainnet server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to note is that "signatures per batch" != "verifications per batch": when we have 2.5 signatures per aggregate, we're performing 30/2.5 = 12 verifications per batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved and expanded commentary - back to 72, it's ridiculously more efficient in terms of throughput and helps significantly with maintaining a healthy mesh on a slow node giving it more time to deal with anomolies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startTick = Moment.now() | ||
ok = | ||
if batchSize == 1: blsVerify(batch[].sigsets[0]) | ||
else: batchCrypto.verifier.batchVerify(batch[].sigsets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to move this in nim-blscurve, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that makes a lot of sense - this PR or separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that was easier said than done: nim-blscurve
doesn't know about the RNG so it can't avoid generating randomness in case of a single signature. Leaving for a potential future refactoring..
signature: CookedSig) = | ||
func init(T: type SignatureSet, | ||
pubkey: CookedPubKey, signing_root: Eth2Digest, | ||
signature: CookedSig): T = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning large value types tend to produce quite inefficient code and a SigSet is huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be fixed as of nim-lang/Nim#19115 which landed in the nim 1.2 branch as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the generated code: no memsets, RVO working as it should - I suspect you've been burnt by the nim bug previously:
N_LIB_PRIVATE N_NIMCALL(void, init__7Ffc9cVDV9aLbtP7bpwJ5yNA)(tyObject_PublicKey__wF9cg4i4Nl9cNaHoaQ6NXiVA *pubkey, tyObject_MDigest__law9ct65KplMYBvtmjCQxbw *signing_root, tyObject_Signature__ZPec3zScfAlRHnqARj9a9asg *signature, tyTuple__7PghGCO9ajM9a39c4M2A31RqQ *Result)
{
(*Result).Field0 = (*pubkey);
nimCopyMem((void *)(*Result).Field1, (NIM_CONST void *)(*signing_root).data, sizeof(tyArray__vEOa9c5qaE9ajWxR5R4zwfQg));
(*Result).Field2 = (*signature);
}
ok, I've dug around a bit and couldn't find any instances of this happening with the new function call style - as noted above, this was likely the result of a bug in the nim compiler that has been fixed since. |
Followup of #3212 to test proper signature verification. Also document possible further optimization based on blst `v0.3.13`.
Followup of #3212 to test proper signature verification. Also document possible further optimization based on blst `v0.3.13`.
Followup of #3212 to test proper signature verification. Also document possible further optimization based on blst `v0.3.13`.
A novel optimisation for attestation and sync committee message
validation: when batching, we look for signatures of the same message
and aggregate these before batch-validating: this results in up to 60%
fewer signature verifications on a busy server, leading to a significant
reduction in CPU usage.
blsVerify
when there is only one signature to verify inthe batch, avoiding the RNG