-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10079: [Rust] Benchmark and improve count bits #8663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10079: [Rust] Benchmark and improve count bits #8663
Conversation
| if self.is_dense() { | ||
| let valid_slots = match self.data.null_buffer() { | ||
| Some(b) => bit_util::count_set_bits_offset(b.data(), 0, index), | ||
| Some(b) => b.count_set_bits_offset(0, index), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this refactoring, but this code in UnionArray did not match my understanding of the array format documentation. At least I don't see any mention that the offsets array is compressed by omitting null values. Or maybe the documentation needs a better example that includes a null bitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got it, the union format actually changed and deprecated the use of a null bitmap: 6df8620
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool, @jhorstmann. LGTM
Left some minor comments, all optional.
This refactors the calculation of null counts by using the bitchunk iterator and
count_onesintrinsic. The performance of array creation improves by between 3%-5%. The biggest impact is on getting a slice of an existing array, for a slice length of 2048 the performance in a microbenchmark doubles.Benchmark results on a Ryzen 3700U. LLVM seems to be able to vectorize the
count_onesintrinsic, so performance on machines with better AVX units should be even higher.