ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance #2093

pitrou · 2018-05-31T15:40:34Z

Also a GenerateBitsUnrolled() for higher performance where warranted.

Benchmarks:

GenerateBits is 1.8x faster than BitmapWriter
GenerateBitsUnrolled is 2.9x faster than BitmapWriter
BooleanBuilder is now 3x faster than with BitmapWriter
(and around 9x faster than it was with SetBitTo calls)

Also a GenerateBitsUnrolled() for higher performance where warranted. Benchmarks: - GenerateBits is 1.8x faster than BitmapWriter - GenerateBitsUnrolled is 2.9x faster than BitmapWriter - BooleanBuilder is now 3x faster than with BitmapWriter (and around 9x faster than it was with SetBitTo calls)

codecov-io · 2018-05-31T17:04:50Z

Codecov Report

Merging #2093 into master will increase coverage by 0.01%.
The diff coverage is 97.95%.

@@            Coverage Diff             @@
##           master    #2093      +/-   ##
==========================================
+ Coverage   86.35%   86.36%   +0.01%     
==========================================
  Files         230      230              
  Lines       40392    40452      +60     
==========================================
+ Hits        34880    34937      +57     
- Misses       5512     5515       +3

Impacted Files	Coverage Δ
cpp/src/arrow/compute/kernels/cast.cc	`89.35% <100%> (-0.14%)`	⬇️
cpp/src/arrow/builder.cc	`81.79% <100%> (-0.43%)`	⬇️
cpp/src/arrow/util/bit-util.h	`98.5% <100%> (+0.49%)`	⬆️
cpp/src/arrow/util/bit-util-test.cc	`99.45% <94.28%> (-0.55%)`	⬇️
cpp/src/arrow/util/thread-pool-test.cc	`98.91% <0%> (-0.55%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d19089e...0ef0d12. Read the comment docs.

wesm

+1, this is awesome! I'm OK to merge whenever you're satisfied with the code enough to remove the WIP

wesm · 2018-05-31T19:58:35Z

cpp/src/arrow/builder.cc

-  bit_writer.Finish();
+  int64_t i = 0;
+  internal::GenerateBitsUnrolled(raw_data_, length_, length,
+                                 [values, &i]() -> bool { return values[i++] != 0; });


I have often wondered if lambda functions have much overhead vs. inlined functions, is there a good reference on how the various compilers behave?

A bit of Googling suggests that in instances like this (where the type of the lambda is a template argument), the lambda will be inlined https://www.quora.com/Are-C++-lambda-functions-always-inlined). If you passed a lambda into a function accepting an std::function of some kind, it wouldn't be necessarily

Yes, apparently the recommended idiom is to let the callable argument be a template parameter so as to select a favorable specialization.

wesm · 2018-05-31T20:12:42Z

cpp/src/arrow/util/bit-util.h

+  int64_t remaining_bytes = remaining / 8;
+  while (remaining_bytes-- > 0) {
+    current_byte = 0;
+    current_byte = g() ? current_byte | 0x01 : current_byte;


Out of curiousity, would current_byte = current_byte | (0x01 * static_cast<uint8_t>(g())) have any better performance (to avoid branching)? I guess it's possible the compiler is doing some kind of optimization anyway

I've tried it quickly and, while the BooleanBuilder benchmark isn't affected, the bit-util microbenchmark became 2x faster. I'm wondering whether in this trivial case, perhaps the whole thing is SIMDed by the compiler. I should take a closer look.

(this is with gcc 4.9 on an AMD Ryzen)

wesm · 2018-06-08T19:47:17Z

+1, merging this. We can do further performance explorations in follow up patches

wesm approved these changes May 31, 2018

View reviewed changes

wesm changed the title ~~[WIP] ARROW-2649: [C++] Add GenerateBits() function~~ ARROW-2649: [C++] Add GenerateBits() function Jun 8, 2018

wesm changed the title ~~ARROW-2649: [C++] Add GenerateBits() function~~ ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance Jun 8, 2018

wesm closed this in 27b869a Jun 8, 2018

pitrou deleted the ARROW-2649-generate-bits branch March 2, 2021 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance #2093

ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance #2093

Uh oh!

pitrou commented May 31, 2018

Uh oh!

codecov-io commented May 31, 2018

Uh oh!

wesm left a comment

Uh oh!

wesm May 31, 2018

Uh oh!

wesm May 31, 2018

Uh oh!

pitrou Jun 2, 2018

Uh oh!

wesm May 31, 2018

Uh oh!

pitrou Jun 2, 2018

Uh oh!

wesm commented Jun 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance #2093

ARROW-2649: [C++] Add GenerateBits() function to improve bitmap writing performance #2093

Uh oh!

Conversation

pitrou commented May 31, 2018

Uh oh!

codecov-io commented May 31, 2018

Codecov Report

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

wesm May 31, 2018

Choose a reason for hiding this comment

Uh oh!

wesm May 31, 2018

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 2, 2018

Choose a reason for hiding this comment

Uh oh!

wesm May 31, 2018

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 2, 2018

Choose a reason for hiding this comment

Uh oh!

wesm commented Jun 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants