[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

yeastplume · 2021-11-30T16:07:33Z

In performing test for PIBD work, it's become evident that header validation is not as efficient as it could be. There are a few reasons for this (with more to be addressed in coming PRs), but much of it comes down to the number of calls to the difficulty iterator in store::DifficultyIter, which deserialises several entire headers from the DB each time it's called. This iterator next method is called 60 times on each header validation to get the block's expected difficulty, which uncovered a large performance issue around the inefficient 'BitVec' struct.

A test (ignored in CI, meant to be run manually against live chain data) is included that copies and validates 100k headers from one chain to another. Before this change, this test in release mode (Mac, 2.3 Ghz 8-Core Intel i9) took about 76 seconds. With this change, the test takes 35 seconds. Flamegraphs are attached demonstrating the before and after (note the huge amount of time spent in DifficultyIter::next())

Before:

After:

Note there's still more to do here performance wise, but was an obvious first candidate for optimisation. Also investigated the use of a bitpacking lib https://github.com/quickwit-inc/bitpacking that actually uses SSE and AVX2 instructions to optimise further if available, but it unfortunately only supports packing up to u32.

Exact Changes:

Remove BitVec
Add pack_bits function, which packs an array of u64s to a specified bit length more efficiently than the previous bitvec implementation
Add Proof::pack_nonces function, which packs an array of u64s to a specified bit length more efficiently
Add Proof::pack_len helper function and use instead of previous bitvec

yeastplume · 2021-11-30T16:13:51Z

Also, suggestions to speed up pack_bits welcome!

tromp · 2021-12-03T17:14:58Z

core/src/pow/types.rs

@@ -448,8 +511,7 @@ impl Readable for Proof {
 		// prepare nonces and read the right number of bytes
 		let mut nonces = Vec::with_capacity(global::proofsize());
 		let nonce_bits = edge_bits as usize;
-		let bits_len = nonce_bits * global::proofsize();
-		let bytes_len = BitVec::bytes_len(bits_len);
+		let bytes_len = (nonce_bits * global::proofsize() + 7) / 8;


Should use pack_len here?!

tromp · 2021-12-03T17:31:01Z

core/src/pow/types.rs

+	// We accumulate bits in it until capacity, at which point we just copy this
+	// mini buffer to compressed.
+	let mut mini_buffer: u64 = 0u64;
+	let mut cursor = 0; //< number of bits written in the mini_buffer.


i'd use only remaining, no cursor.

tromp · 2021-12-03T17:31:17Z

core/src/pow/types.rs

+	let mut cursor = 0; //< number of bits written in the mini_buffer.
+	let mut pack_bytes_remaining = compressed.len();
+	for el in uncompressed {
+		let remaining = 64 - cursor;


this becomes redundant

mini_buffer |= el << (64 - remaining)
happens in all 3 cases, and so should be done here before the comparison.

tromp · 2021-12-03T17:33:07Z

core/src/pow/types.rs

+			Ordering::Less => {
+				// Plenty of room remaining in our mini buffer.
+				mini_buffer |= el << cursor;
+				cursor += bit_width;


this becomes remaining -= bit_width

tromp · 2021-12-03T17:35:58Z

core/src/pow/types.rs

+				compressed = &mut compressed[8..];
+				pack_bytes_remaining -= 8;
+				mini_buffer = 0u64;
+				cursor = 0;


this becomes remaining -= 64

tromp · 2021-12-03T18:12:42Z

core/src/pow/types.rs

+			}
+			Ordering::Greater => {
+				mini_buffer |= el << cursor;
+				// We have completed our minibuffer.


i'd say overflowed:-)

tromp · 2021-12-03T18:15:16Z

core/src/pow/types.rs

+	let mut pack_bytes_remaining = compressed.len();
+	for el in uncompressed {
+		let remaining = 64 - cursor;
+		match bit_width.cmp(&remaining) {


There's too much overlap between cases Equal and Greater to merit separating.

tromp · 2021-12-03T19:31:51Z

core/src/pow/types.rs

+	// mini buffer to compressed.
+	let mut mini_buffer: u64 = 0u64;
+	let mut cursor = 0; //< number of bits written in the mini_buffer.
+	let mut pack_bytes_remaining = compressed.len();


no need to keep track of this.

tromp · 2021-12-03T19:32:50Z

core/src/pow/types.rs

+			}
+		}
+	}
+	if pack_bytes_remaining > 0 {


pack_bytes_remaining is simply compressed.len() % 8.

tromp

looking good; left some suggestions for minor improvement.

tromp

See the comments with suggestions for minor improvement.

We should similarly rewrite the repeated calls to read_number in Proof::read
into a single unpack function.

yeastplume · 2021-12-06T15:19:17Z

Thank you, the compression function has been greatly compressed with suggestions above

…n with more efficient algorithm (mimblewimble#3670) * replace bitvec with more efficient bitpack algorithm * optimise proof_unpack_len * move proof pack length calculation * small refactor * integrate suggestions in mimblewimble#3670 * finish compressing compression function * remove ordering cmp from pack function * remainder fix for new logic * remove println statements * remove ordering import warning

yeastplume added 4 commits November 30, 2021 13:51

replace bitvec with more efficient bitpack algorithm

075b990

optimise proof_unpack_len

e08d17a

move proof pack length calculation

621f93d

small refactor

9ea6f44

yeastplume requested review from tromp, j01tz and quentinlesceller November 30, 2021 16:08

yeastplume mentioned this pull request Dec 2, 2021

[SYNC PERFORMANCE] Adjust DifficultyIterator to no longer deserialize PoW proof nonces #3671

Merged

tromp reviewed Dec 3, 2021

View reviewed changes

core/src/pow/types.rs Outdated

}

}

}

if pack_bytes_remaining > 0 {

Copy link

Contributor

tromp Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack_bytes_remaining is simply compressed.len() % 8.

tromp approved these changes Dec 3, 2021

View reviewed changes

tromp self-requested a review December 3, 2021 22:33

tromp approved these changes Dec 3, 2021

View reviewed changes

yeastplume added 2 commits December 6, 2021 14:57

integrate suggestions in mimblewimble#3670

7ae6643

finish compressing compression function

834fcf4

yeastplume added 4 commits December 6, 2021 15:23

remove ordering cmp from pack function

0f5e699

remainder fix for new logic

048af14

remove println statements

df4b46f

remove ordering import warning

2e30348

yeastplume merged commit 7725a05 into mimblewimble:master Dec 6, 2021

yeastplume deleted the header_sync_perf branch January 20, 2022 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

yeastplume commented Nov 30, 2021

yeastplume commented Nov 30, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021 •

edited

Loading

tromp Dec 3, 2021 •

edited

Loading

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp Dec 3, 2021

tromp left a comment

tromp left a comment

yeastplume commented Dec 6, 2021

[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

[SYNC PERFORMANCE] Replace header proof serialisation with more efficient algorithm #3670

Conversation

yeastplume commented Nov 30, 2021

yeastplume commented Nov 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tromp Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

tromp Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tromp left a comment

Choose a reason for hiding this comment

tromp left a comment

Choose a reason for hiding this comment

yeastplume commented Dec 6, 2021

tromp Dec 3, 2021 •

edited

Loading

tromp Dec 3, 2021 •

edited

Loading