ARROW-10079: [Rust] Benchmark and improve count bits #8663

jhorstmann · 2020-11-14T16:43:39Z

This refactors the calculation of null counts by using the bitchunk iterator and count_ones intrinsic. The performance of array creation improves by between 3%-5%. The biggest impact is on getting a slice of an existing array, for a slice length of 2048 the performance in a microbenchmark doubles.

Benchmark results on a Ryzen 3700U. LLVM seems to be able to vectorize the count_ones intrinsic, so performance on machines with better AVX units should be even higher.

Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_slice-d438c5aeed9bef19
Gnuplot not found, using plotters backend
array_slice 128         time:   [150.68 ns 151.74 ns 153.14 ns]                            
                        change: [-11.525% -10.357% -9.3284%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

array_slice 512         time:   [158.77 ns 161.03 ns 163.62 ns]                            
                        change: [-25.922% -24.956% -23.945%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

array_slice 2048        time:   [170.17 ns 171.24 ns 172.60 ns]                             
                        change: [-50.388% -49.865% -49.375%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_from_vec-8a972c208e7a6334
Gnuplot not found, using plotters backend
array_from_vec 128      time:   [750.62 ns 751.69 ns 752.85 ns]                                
                        change: [-8.2512% -7.3658% -6.5917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

array_from_vec 256      time:   [1.2501 us 1.2580 us 1.2676 us]                                
                        change: [-0.9517% +0.1364% +1.1850%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

array_from_vec 512      time:   [2.1603 us 2.1643 us 2.1690 us]                                
                        change: [-2.8122% -2.1984% -1.6379%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe

array_string_from_vec 128                                                                             
                        time:   [3.2196 us 3.2288 us 3.2395 us]
                        change: [+1.4107% +1.9839% +2.5419%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

array_string_from_vec 256                                                                             
                        time:   [4.8112 us 4.8352 us 4.8685 us]
                        change: [-5.2564% -3.8312% -2.7672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

array_string_from_vec 512                                                                             
                        time:   [8.2588 us 8.2802 us 8.3049 us]
                        change: [-0.5974% -0.2991% +0.0027%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

struct_array_from_vec 128                                                                             
                        time:   [4.5771 us 4.5947 us 4.6206 us]
                        change: [-6.6706% -5.3873% -4.2340%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe

struct_array_from_vec 256                                                                             
                        time:   [6.7617 us 6.7887 us 6.8182 us]
                        change: [-4.9262% -4.2419% -3.6528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

struct_array_from_vec 512                                                                             
                        time:   [9.8926 us 9.9639 us 10.052 us]
                        change: [-5.4057% -3.9331% -2.5242%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

struct_array_from_vec 1024                                                                             
                        time:   [15.812 us 15.862 us 15.922 us]
                        change: [-6.1966% -5.6634% -5.0338%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/builder-26beb41d8ad91167
Gnuplot not found, using plotters backend
bench_primitive         time:   [4.8277 ms 4.8879 ms 4.9558 ms]                             
                        thrpt:  [807.14 MiB/s 818.35 MiB/s 828.55 MiB/s]
                 change:
                        time:   [-5.3415% -3.4401% -1.3973%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4171% +3.5626% +5.6429%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

bench_bool              time:   [2.5176 ms 2.5230 ms 2.5292 ms]                        
                        thrpt:  [197.69 MiB/s 198.18 MiB/s 198.60 MiB/s]
                 change:
                        time:   [-19.412% -19.203% -18.988%] (p = 0.00 < 0.05)
                        thrpt:  [+23.438% +23.767% +24.088%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

…_ones intrinsic

github-actions · 2020-11-14T16:46:42Z

https://issues.apache.org/jira/browse/ARROW-10079

jhorstmann · 2020-11-14T16:58:47Z

rust/arrow/src/array/array_union.rs

        if self.is_dense() {
            let valid_slots = match self.data.null_buffer() {
-                Some(b) => bit_util::count_set_bits_offset(b.data(), 0, index),
+                Some(b) => b.count_set_bits_offset(0, index),


Unrelated to this refactoring, but this code in UnionArray did not match my understanding of the array format documentation. At least I don't see any mention that the offsets array is compressed by omitting null values. Or maybe the documentation needs a better example that includes a null bitmap.

Oh, I got it, the union format actually changed and deprecated the use of a null bitmap: 6df8620

jorgecarleitao

Really cool, @jhorstmann. LGTM

Left some minor comments, all optional.

rust/arrow/src/buffer.rs

jhorstmann added 3 commits November 14, 2020 16:43

ARROW-10079 Benchmark for creating array slices

19f566e

ARROW-10079 Refactor count_set_bits to use chunked iterator and count…

6a877ca

…_ones intrinsic

ARROW-10070 Move count_set_bits tests to buffer.rs

1a12dc6

github-actions bot added the Component: Rust label Nov 14, 2020

jhorstmann commented Nov 14, 2020

View reviewed changes

ARROW-10079 Add comment about union validity bitmap

f1b0452

jorgecarleitao approved these changes Nov 14, 2020

View reviewed changes

rust/arrow/src/buffer.rs Outdated Show resolved Hide resolved

rust/arrow/src/buffer.rs Show resolved Hide resolved

ARROW-10070 Additional test and comments

d7ad888

nevi-me approved these changes Nov 15, 2020

View reviewed changes

nevi-me closed this in 5666dca Nov 15, 2020

nevi-me mentioned this pull request Nov 15, 2020

ARROW-10588: [Rust] Safe and parallel bit operations for Arrow #8664

Closed

asfimport mentioned this pull request Nov 15, 2020

[Rust]: Benchmark and improve count_set_bits function #26096

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-10079: [Rust] Benchmark and improve count bits #8663

ARROW-10079: [Rust] Benchmark and improve count bits #8663

Uh oh!

jhorstmann commented Nov 14, 2020

Uh oh!

github-actions bot commented Nov 14, 2020

Uh oh!

jhorstmann Nov 14, 2020

Uh oh!

jhorstmann Nov 14, 2020

Uh oh!

jorgecarleitao left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-10079: [Rust] Benchmark and improve count bits #8663

ARROW-10079: [Rust] Benchmark and improve count bits #8663

Uh oh!

Conversation

jhorstmann commented Nov 14, 2020

Uh oh!

github-actions bot commented Nov 14, 2020

Uh oh!

jhorstmann Nov 14, 2020

Choose a reason for hiding this comment

Uh oh!

jhorstmann Nov 14, 2020

Choose a reason for hiding this comment

Uh oh!

jorgecarleitao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants