Skip to content

[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 6, 2021

Conversation

yeastplume
Copy link
Member

In performing test for PIBD work, it's become evident that header validation is not as efficient as it could be. There are a few reasons for this (with more to be addressed in coming PRs), but much of it comes down to the number of calls to the difficulty iterator in store::DifficultyIter, which deserialises several entire headers from the DB each time it's called. This iterator next method is called 60 times on each header validation to get the block's expected difficulty, which uncovered a large performance issue around the inefficient 'BitVec' struct.

A test (ignored in CI, meant to be run manually against live chain data) is included that copies and validates 100k headers from one chain to another. Before this change, this test in release mode (Mac, 2.3 Ghz 8-Core Intel i9) took about 76 seconds. With this change, the test takes 35 seconds. Flamegraphs are attached demonstrating the before and after (note the huge amount of time spent in DifficultyIter::next())

Before:
image

After:
image

Note there's still more to do here performance wise, but was an obvious first candidate for optimisation. Also investigated the use of a bitpacking lib https://github.com/quickwit-inc/bitpacking that actually uses SSE and AVX2 instructions to optimise further if available, but it unfortunately only supports packing up to u32.

Exact Changes:

  • Remove BitVec
  • Add pack_bits function, which packs an array of u64s to a specified bit length more efficiently than the previous bitvec implementation
  • Add Proof::pack_nonces function, which packs an array of u64s to a specified bit length more efficiently
  • Add Proof::pack_len helper function and use instead of previous bitvec

@yeastplume
Copy link
Member Author

Also, suggestions to speed up pack_bits welcome!

@@ -448,8 +511,7 @@ impl Readable for Proof {
// prepare nonces and read the right number of bytes
let mut nonces = Vec::with_capacity(global::proofsize());
let nonce_bits = edge_bits as usize;
let bits_len = nonce_bits * global::proofsize();
let bytes_len = BitVec::bytes_len(bits_len);
let bytes_len = (nonce_bits * global::proofsize() + 7) / 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use pack_len here?!

// We accumulate bits in it until capacity, at which point we just copy this
// mini buffer to compressed.
let mut mini_buffer: u64 = 0u64;
let mut cursor = 0; //< number of bits written in the mini_buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd use only remaining, no cursor.

let mut cursor = 0; //< number of bits written in the mini_buffer.
let mut pack_bytes_remaining = compressed.len();
for el in uncompressed {
let remaining = 64 - cursor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this becomes redundant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini_buffer |= el << (64 - remaining)
happens in all 3 cases, and so should be done here before the comparison.

Ordering::Less => {
// Plenty of room remaining in our mini buffer.
mini_buffer |= el << cursor;
cursor += bit_width;
Copy link
Contributor

@tromp tromp Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this becomes remaining -= bit_width

compressed = &mut compressed[8..];
pack_bytes_remaining -= 8;
mini_buffer = 0u64;
cursor = 0;
Copy link
Contributor

@tromp tromp Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this becomes remaining -= 64

}
Ordering::Greater => {
mini_buffer |= el << cursor;
// We have completed our minibuffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd say overflowed:-)

let mut pack_bytes_remaining = compressed.len();
for el in uncompressed {
let remaining = 64 - cursor;
match bit_width.cmp(&remaining) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's too much overlap between cases Equal and Greater to merit separating.

// mini buffer to compressed.
let mut mini_buffer: u64 = 0u64;
let mut cursor = 0; //< number of bits written in the mini_buffer.
let mut pack_bytes_remaining = compressed.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to keep track of this.

}
}
}
if pack_bytes_remaining > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack_bytes_remaining is simply compressed.len() % 8.

Copy link
Contributor

@tromp tromp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good; left some suggestions for minor improvement.

@tromp tromp self-requested a review December 3, 2021 22:33
Copy link
Contributor

@tromp tromp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments with suggestions for minor improvement.

We should similarly rewrite the repeated calls to read_number in Proof::read
into a single unpack function.

@yeastplume
Copy link
Member Author

Thank you, the compression function has been greatly compressed with suggestions above

@yeastplume yeastplume merged commit 7725a05 into mimblewimble:master Dec 6, 2021
@yeastplume yeastplume deleted the header_sync_perf branch January 20, 2022 10:01
bayk added a commit to mwcproject/mwc-node that referenced this pull request Jun 21, 2024
…n with more efficient algorithm (mimblewimble#3670)

* replace bitvec with more efficient bitpack algorithm
* optimise proof_unpack_len
* move proof pack length calculation
* small refactor
* integrate suggestions in mimblewimble#3670
* finish compressing compression function
* remove ordering cmp from pack function
* remainder fix for new logic
* remove println statements
* remove ordering import warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants