Skip to content
This repository was archived by the owner on Feb 18, 2024. It is now read-only.

Optimize comparison kernels#17

Merged
jorgecarleitao merged 3 commits intojorgecarleitao:mainfrom
Dandandan:comparison_kernels
Mar 21, 2021
Merged

Optimize comparison kernels#17
jorgecarleitao merged 3 commits intojorgecarleitao:mainfrom
Dandandan:comparison_kernels

Conversation

@Dandandan
Copy link
Copy Markdown
Collaborator

@Dandandan Dandandan commented Mar 21, 2021

This avoids the conditional jumps, which is faster when the pattern is not super predictable (which is the case for %10).

I also "fixed" the benchmarks, as they currently run on two arrays with equal elements as they use the same seed and introduced a lt version.

For arrays with a high "matching percentage" this has a huge performance impact (see lt benchmark, this is 5-10x faster), but a low matching percentage (eq on random floats) it is slower to compare two arrays. The eq benchmark will now match probably never, so the CPU can predict those branches perfectly in the version on main.

Benchmarking eq Float32: Collecting 100 samples in estimated 5.1909 s (121k ite                                                                               eq Float32              time:   [41.836 us 41.890 us 41.947 us]
                        change: [+65.172% +65.623% +66.045%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking eq scalar Float32: Collecting 100 samples in estimated 5.0676 s (2                                                                               eq scalar Float32       time:   [20.758 us 20.789 us 20.826 us]
                        change: [-7.0110% -6.7414% -6.5092%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking lt Float32: Collecting 100 samples in estimated 5.1891 s (136k ite                                                                               lt Float32              time:   [37.666 us 37.725 us 37.789 us]
                        change: [-82.416% -82.241% -82.070%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking lt scalar Float32: Collecting 100 samples in estimated 5.0062 s (2                                                                               lt scalar Float32       time:   [21.252 us 21.272 us 21.290 us]
                        change: [-90.458% -90.437% -90.415%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #17 (0e4cba3) into main (65d434c) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
- Coverage   63.31%   63.31%   -0.01%     
==========================================
  Files         150      150              
  Lines       14424    14422       -2     
==========================================
- Hits         9133     9131       -2     
  Misses       5291     5291              
Impacted Files Coverage Δ
src/compute/comparison/primitive.rs 94.97% <100.00%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65d434c...0e4cba3. Read the comment docs.

@Dandandan
Copy link
Copy Markdown
Collaborator Author

There is also a strange part in the benchmark here.

    let arr_a = create_primitive_array::<f32>(size, DataType::Float32, 0.0);
    let arr_b = create_primitive_array::<f32>(size, DataType::Float32, 0.0);

As the left & right arrays are using the same seed, they have exactly the same elements!

@jorgecarleitao jorgecarleitao merged commit e7807f2 into jorgecarleitao:main Mar 21, 2021
@jorgecarleitao
Copy link
Copy Markdown
Owner

Thanks @Dandandan . It was a lot of fun to go through this with you. :)

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Apr 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement An improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants