Optimize comparison kernels by Dandandan · Pull Request #17 · jorgecarleitao/arrow2

Dandandan · 2021-03-21T13:10:25Z

This avoids the conditional jumps, which is faster when the pattern is not super predictable (which is the case for %10).

I also "fixed" the benchmarks, as they currently run on two arrays with equal elements as they use the same seed and introduced a lt version.

For arrays with a high "matching percentage" this has a huge performance impact (see lt benchmark, this is 5-10x faster), but a low matching percentage (eq on random floats) it is slower to compare two arrays. The eq benchmark will now match probably never, so the CPU can predict those branches perfectly in the version on main.

Benchmarking eq Float32: Collecting 100 samples in estimated 5.1909 s (121k ite                                                                               eq Float32              time:   [41.836 us 41.890 us 41.947 us]
                        change: [+65.172% +65.623% +66.045%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking eq scalar Float32: Collecting 100 samples in estimated 5.0676 s (2                                                                               eq scalar Float32       time:   [20.758 us 20.789 us 20.826 us]
                        change: [-7.0110% -6.7414% -6.5092%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking lt Float32: Collecting 100 samples in estimated 5.1891 s (136k ite                                                                               lt Float32              time:   [37.666 us 37.725 us 37.789 us]
                        change: [-82.416% -82.241% -82.070%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking lt scalar Float32: Collecting 100 samples in estimated 5.0062 s (2                                                                               lt scalar Float32       time:   [21.252 us 21.272 us 21.290 us]
                        change: [-90.458% -90.437% -90.415%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

codecov-io · 2021-03-21T13:18:10Z

Codecov Report

Merging #17 (0e4cba3) into main (65d434c) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
- Coverage   63.31%   63.31%   -0.01%     
==========================================
  Files         150      150              
  Lines       14424    14422       -2     
==========================================
- Hits         9133     9131       -2     
  Misses       5291     5291

Impacted Files	Coverage Δ
src/compute/comparison/primitive.rs	`94.97% <100.00%> (-0.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65d434c...0e4cba3. Read the comment docs.

Dandandan · 2021-03-21T13:19:42Z

There is also a strange part in the benchmark here.

    let arr_a = create_primitive_array::<f32>(size, DataType::Float32, 0.0);
    let arr_b = create_primitive_array::<f32>(size, DataType::Float32, 0.0);

As the left & right arrays are using the same seed, they have exactly the same elements!

jorgecarleitao · 2021-03-21T16:34:58Z

Thanks @Dandandan . It was a lot of fun to go through this with you. :)

Optimize comparison kernels

0e4cba3

Dandandan mentioned this pull request Mar 21, 2021

ARROW-12032: [Rust] Optimize comparison kernels apache/arrow#9759

Closed

Dandandan added 2 commits March 21, 2021 14:22

Create arrays with different seeds

4e74bf8

Use value that matches in 50% of cases

f8c7e0d

jorgecarleitao merged commit e7807f2 into jorgecarleitao:main Mar 21, 2021

jorgecarleitao added the enhancement An improvement to an existing feature label Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize comparison kernels#17

Optimize comparison kernels#17
jorgecarleitao merged 3 commits intojorgecarleitao:mainfrom
Dandandan:comparison_kernels

Dandandan commented Mar 21, 2021 •

edited

Loading

Uh oh!

codecov-io commented Mar 21, 2021

Uh oh!

Dandandan commented Mar 21, 2021

Uh oh!

jorgecarleitao commented Mar 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dandandan commented Mar 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Mar 21, 2021

Codecov Report

Uh oh!

Dandandan commented Mar 21, 2021

Uh oh!

jorgecarleitao commented Mar 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dandandan commented Mar 21, 2021 •

edited

Loading