Skip to content

Conversation

@jhorstmann
Copy link
Contributor

This refactors the calculation of null counts by using the bitchunk iterator and count_ones intrinsic. The performance of array creation improves by between 3%-5%. The biggest impact is on getting a slice of an existing array, for a slice length of 2048 the performance in a microbenchmark doubles.

Benchmark results on a Ryzen 3700U. LLVM seems to be able to vectorize the count_ones intrinsic, so performance on machines with better AVX units should be even higher.

Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_slice-d438c5aeed9bef19
Gnuplot not found, using plotters backend
array_slice 128         time:   [150.68 ns 151.74 ns 153.14 ns]                            
                        change: [-11.525% -10.357% -9.3284%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

array_slice 512         time:   [158.77 ns 161.03 ns 163.62 ns]                            
                        change: [-25.922% -24.956% -23.945%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

array_slice 2048        time:   [170.17 ns 171.24 ns 172.60 ns]                             
                        change: [-50.388% -49.865% -49.375%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_from_vec-8a972c208e7a6334
Gnuplot not found, using plotters backend
array_from_vec 128      time:   [750.62 ns 751.69 ns 752.85 ns]                                
                        change: [-8.2512% -7.3658% -6.5917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

array_from_vec 256      time:   [1.2501 us 1.2580 us 1.2676 us]                                
                        change: [-0.9517% +0.1364% +1.1850%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

array_from_vec 512      time:   [2.1603 us 2.1643 us 2.1690 us]                                
                        change: [-2.8122% -2.1984% -1.6379%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe

array_string_from_vec 128                                                                             
                        time:   [3.2196 us 3.2288 us 3.2395 us]
                        change: [+1.4107% +1.9839% +2.5419%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

array_string_from_vec 256                                                                             
                        time:   [4.8112 us 4.8352 us 4.8685 us]
                        change: [-5.2564% -3.8312% -2.7672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

array_string_from_vec 512                                                                             
                        time:   [8.2588 us 8.2802 us 8.3049 us]
                        change: [-0.5974% -0.2991% +0.0027%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

struct_array_from_vec 128                                                                             
                        time:   [4.5771 us 4.5947 us 4.6206 us]
                        change: [-6.6706% -5.3873% -4.2340%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe

struct_array_from_vec 256                                                                             
                        time:   [6.7617 us 6.7887 us 6.8182 us]
                        change: [-4.9262% -4.2419% -3.6528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

struct_array_from_vec 512                                                                             
                        time:   [9.8926 us 9.9639 us 10.052 us]
                        change: [-5.4057% -3.9331% -2.5242%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

struct_array_from_vec 1024                                                                             
                        time:   [15.812 us 15.862 us 15.922 us]
                        change: [-6.1966% -5.6634% -5.0338%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/builder-26beb41d8ad91167
Gnuplot not found, using plotters backend
bench_primitive         time:   [4.8277 ms 4.8879 ms 4.9558 ms]                             
                        thrpt:  [807.14 MiB/s 818.35 MiB/s 828.55 MiB/s]
                 change:
                        time:   [-5.3415% -3.4401% -1.3973%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4171% +3.5626% +5.6429%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

bench_bool              time:   [2.5176 ms 2.5230 ms 2.5292 ms]                        
                        thrpt:  [197.69 MiB/s 198.18 MiB/s 198.60 MiB/s]
                 change:
                        time:   [-19.412% -19.203% -18.988%] (p = 0.00 < 0.05)
                        thrpt:  [+23.438% +23.767% +24.088%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

@github-actions
Copy link

if self.is_dense() {
let valid_slots = match self.data.null_buffer() {
Some(b) => bit_util::count_set_bits_offset(b.data(), 0, index),
Some(b) => b.count_set_bits_offset(0, index),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this refactoring, but this code in UnionArray did not match my understanding of the array format documentation. At least I don't see any mention that the offsets array is compressed by omitting null values. Or maybe the documentation needs a better example that includes a null bitmap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I got it, the union format actually changed and deprecated the use of a null bitmap: 6df8620

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool, @jhorstmann. LGTM

Left some minor comments, all optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants