-
Notifications
You must be signed in to change notification settings - Fork 992
[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also, suggestions to speed up |
core/src/pow/types.rs
Outdated
@@ -448,8 +511,7 @@ impl Readable for Proof { | |||
// prepare nonces and read the right number of bytes | |||
let mut nonces = Vec::with_capacity(global::proofsize()); | |||
let nonce_bits = edge_bits as usize; | |||
let bits_len = nonce_bits * global::proofsize(); | |||
let bytes_len = BitVec::bytes_len(bits_len); | |||
let bytes_len = (nonce_bits * global::proofsize() + 7) / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use pack_len here?!
core/src/pow/types.rs
Outdated
// We accumulate bits in it until capacity, at which point we just copy this | ||
// mini buffer to compressed. | ||
let mut mini_buffer: u64 = 0u64; | ||
let mut cursor = 0; //< number of bits written in the mini_buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd use only remaining, no cursor.
core/src/pow/types.rs
Outdated
let mut cursor = 0; //< number of bits written in the mini_buffer. | ||
let mut pack_bytes_remaining = compressed.len(); | ||
for el in uncompressed { | ||
let remaining = 64 - cursor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this becomes redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini_buffer |= el << (64 - remaining)
happens in all 3 cases, and so should be done here before the comparison.
core/src/pow/types.rs
Outdated
Ordering::Less => { | ||
// Plenty of room remaining in our mini buffer. | ||
mini_buffer |= el << cursor; | ||
cursor += bit_width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this becomes remaining -= bit_width
core/src/pow/types.rs
Outdated
compressed = &mut compressed[8..]; | ||
pack_bytes_remaining -= 8; | ||
mini_buffer = 0u64; | ||
cursor = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this becomes remaining -= 64
core/src/pow/types.rs
Outdated
} | ||
Ordering::Greater => { | ||
mini_buffer |= el << cursor; | ||
// We have completed our minibuffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say overflowed:-)
core/src/pow/types.rs
Outdated
let mut pack_bytes_remaining = compressed.len(); | ||
for el in uncompressed { | ||
let remaining = 64 - cursor; | ||
match bit_width.cmp(&remaining) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's too much overlap between cases Equal and Greater to merit separating.
core/src/pow/types.rs
Outdated
// mini buffer to compressed. | ||
let mut mini_buffer: u64 = 0u64; | ||
let mut cursor = 0; //< number of bits written in the mini_buffer. | ||
let mut pack_bytes_remaining = compressed.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to keep track of this.
core/src/pow/types.rs
Outdated
} | ||
} | ||
} | ||
if pack_bytes_remaining > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pack_bytes_remaining is simply compressed.len() % 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good; left some suggestions for minor improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments with suggestions for minor improvement.
We should similarly rewrite the repeated calls to read_number in Proof::read
into a single unpack function.
Thank you, the compression function has been greatly compressed with suggestions above |
…n with more efficient algorithm (mimblewimble#3670) * replace bitvec with more efficient bitpack algorithm * optimise proof_unpack_len * move proof pack length calculation * small refactor * integrate suggestions in mimblewimble#3670 * finish compressing compression function * remove ordering cmp from pack function * remainder fix for new logic * remove println statements * remove ordering import warning
In performing test for PIBD work, it's become evident that header validation is not as efficient as it could be. There are a few reasons for this (with more to be addressed in coming PRs), but much of it comes down to the number of calls to the difficulty iterator in
store::DifficultyIter
, which deserialises several entire headers from the DB each time it's called. This iterator next method is called 60 times on each header validation to get the block's expected difficulty, which uncovered a large performance issue around the inefficient 'BitVec' struct.A test (ignored in CI, meant to be run manually against live chain data) is included that copies and validates 100k headers from one chain to another. Before this change, this test in release mode (Mac, 2.3 Ghz 8-Core Intel i9) took about 76 seconds. With this change, the test takes 35 seconds. Flamegraphs are attached demonstrating the before and after (note the huge amount of time spent in
DifficultyIter::next()
)Before:

After:

Note there's still more to do here performance wise, but was an obvious first candidate for optimisation. Also investigated the use of a bitpacking lib https://github.com/quickwit-inc/bitpacking that actually uses SSE and AVX2 instructions to optimise further if available, but it unfortunately only supports packing up to u32.
Exact Changes:
pack_bits
function, which packs an array of u64s to a specified bit length more efficiently than the previous bitvec implementationProof::pack_nonces
function, which packs an array of u64s to a specified bit length more efficientlyProof::pack_len
helper function and use instead of previous bitvec