ARROW-10304: [C++][Compute] Optimize variance kernel for integers #8466

cyb70289 · 2020-10-15T03:23:20Z

Improve variance kernel performance for integers by leveraging
textbook one pass algorithm and integer arithmetic.

github-actions · 2020-10-15T03:27:37Z

https://issues.apache.org/jira/browse/ARROW-10304

cyb70289 · 2020-10-15T03:28:18Z

NOTE: Benchmark PR #8407 is not merged yet. Need to manually pull that PR to evaluation performance.

Tested on Xeon Gold 5218, clang-9.

                             benchmark         baseline        contender  change %
7        VarianceKernelInt32/1048576/0    1.805 GiB/sec    6.896 GiB/sec   282.033   'null_percent': 0.0
8    VarianceKernelInt32/1048576/10000    1.751 GiB/sec    5.113 GiB/sec   191.898  'null_percent': 0.01
16     VarianceKernelInt32/1048576/100    1.139 GiB/sec    2.748 GiB/sec   141.206   'null_percent': 1.0
18      VarianceKernelInt32/1048576/10  862.260 MiB/sec    1.547 GiB/sec    83.714  'null_percent': 10.0
1        VarianceKernelInt32/1048576/2  457.956 MiB/sec  769.333 MiB/sec    67.993  'null_percent': 50.0
9        VarianceKernelInt64/1048576/0    3.596 GiB/sec    4.949 GiB/sec    37.601   'null_percent': 0.0
5    VarianceKernelInt64/1048576/10000    3.484 GiB/sec    4.627 GiB/sec    32.832  'null_percent': 0.01
22     VarianceKernelInt64/1048576/100    2.285 GiB/sec    2.787 GiB/sec    21.955   'null_percent': 1.0
3        VarianceKernelFloat/1048576/2  397.394 MiB/sec  454.485 MiB/sec    14.366  'null_percent': 50.0
10      VarianceKernelFloat/1048576/10  790.793 MiB/sec  854.667 MiB/sec     8.077  'null_percent': 10.0
0        VarianceKernelInt64/1048576/1    1.220 TiB/sec    1.261 TiB/sec     3.353  null_percent': 100.0
14       VarianceKernelInt32/1048576/1    1.222 TiB/sec    1.254 TiB/sec     2.647  null_percent': 100.0
21      VarianceKernelDouble/1048576/1    1.206 TiB/sec    1.235 TiB/sec     2.379  null_percent': 100.0
17       VarianceKernelFloat/1048576/1    1.180 TiB/sec    1.206 TiB/sec     2.208  null_percent': 100.0
4   VarianceKernelDouble/1048576/10000    3.485 GiB/sec    3.475 GiB/sec    -0.277  'null_percent': 0.01
23      VarianceKernelDouble/1048576/0    3.595 GiB/sec    3.575 GiB/sec    -0.557   'null_percent': 0.0
2      VarianceKernelFloat/1048576/100    1.133 GiB/sec    1.126 GiB/sec    -0.632   'null_percent': 1.0
13       VarianceKernelFloat/1048576/0    1.804 GiB/sec    1.792 GiB/sec    -0.643   'null_percent': 0.0
20   VarianceKernelFloat/1048576/10000    1.750 GiB/sec    1.739 GiB/sec    -0.677  'null_percent': 0.01
11    VarianceKernelDouble/1048576/100    2.287 GiB/sec    2.262 GiB/sec    -1.092   'null_percent': 1.0
19      VarianceKernelDouble/1048576/2  866.210 MiB/sec  836.046 MiB/sec    -3.482  'null_percent': 50.0
6      VarianceKernelDouble/1048576/10    1.658 GiB/sec    1.597 GiB/sec    -3.672  'null_percent': 10.0
15      VarianceKernelInt64/1048576/10    1.687 GiB/sec    1.564 GiB/sec    -7.304  'null_percent': 10.0
12       VarianceKernelInt64/1048576/2  914.036 MiB/sec  789.209 MiB/sec   -13.657  'null_percent': 50.0

cyb70289 · 2020-10-15T05:49:46Z

Turn to draft. Will add 64bit integers optimization.

cyb70289 · 2020-10-16T07:01:26Z

Added int64 optimization. Updated benchmark result. Ready for review.
Big improvement for int32. Moderate improvement for int64 with few null values.
Some drop for int64 with many null values.

cpp/src/arrow/compute/kernels/aggregate_var_std.cc

Improve variance kernel performance for integers by leveraging textbook one pass algorithm and integer arithmetic.

pitrou · 2020-10-21T15:30:20Z

Results on an AMD Zen 2 CPU:

VarianceKernelInt32/1048576/10000         140 us          140 us         5030 bytes_per_second=6.98658G/s null_percent=0.01 size=1048.58k
VarianceKernelInt32/1048576/100           216 us          216 us         3267 bytes_per_second=4.5294G/s null_percent=1 size=1048.58k
VarianceKernelInt32/1048576/10            397 us          397 us         1763 bytes_per_second=2.45765G/s null_percent=10 size=1048.58k
VarianceKernelInt32/1048576/2             974 us          974 us          718 bytes_per_second=1026.87M/s null_percent=50 size=1048.58k
VarianceKernelInt32/1048576/1           0.816 us        0.816 us       844145 bytes_per_second=1.1684T/s null_percent=100 size=1048.58k
VarianceKernelInt32/1048576/0             130 us          130 us         5414 bytes_per_second=7.51569G/s null_percent=0 size=1048.58k

VarianceKernelInt64/1048576/10000         135 us          135 us         5174 bytes_per_second=7.22877G/s null_percent=0.01 size=1048.58k
VarianceKernelInt64/1048576/100           260 us          260 us         2682 bytes_per_second=3.7503G/s null_percent=1 size=1048.58k
VarianceKernelInt64/1048576/10            440 us          440 us         1591 bytes_per_second=2.21931G/s null_percent=10 size=1048.58k
VarianceKernelInt64/1048576/2             884 us          884 us          783 bytes_per_second=1.10507G/s null_percent=50 size=1048.58k
VarianceKernelInt64/1048576/1           0.821 us        0.821 us       840316 bytes_per_second=1.16182T/s null_percent=100 size=1048.58k
VarianceKernelInt64/1048576/0             123 us          123 us         5620 bytes_per_second=7.94262G/s null_percent=0 size=1048.58k

VarianceKernelFloat/1048576/10000         366 us          366 us         1909 bytes_per_second=2.66576G/s null_percent=0.01 size=1048.58k
VarianceKernelFloat/1048576/100           751 us          751 us          909 bytes_per_second=1.3003G/s null_percent=1 size=1048.58k
VarianceKernelFloat/1048576/10           1097 us         1097 us          637 bytes_per_second=911.712M/s null_percent=10 size=1048.58k
VarianceKernelFloat/1048576/2            1803 us         1802 us          387 bytes_per_second=554.854M/s null_percent=50 size=1048.58k
VarianceKernelFloat/1048576/1           0.817 us        0.817 us       838993 bytes_per_second=1.1679T/s null_percent=100 size=1048.58k
VarianceKernelFloat/1048576/0             346 us          346 us         2021 bytes_per_second=2.82409G/s null_percent=0 size=1048.58k

VarianceKernelDouble/1048576/10000        184 us          184 us         3751 bytes_per_second=5.30153G/s null_percent=0.01 size=1048.58k
VarianceKernelDouble/1048576/100          372 us          372 us         1869 bytes_per_second=2.62218G/s null_percent=1 size=1048.58k
VarianceKernelDouble/1048576/10           549 us          549 us         1249 bytes_per_second=1.77993G/s null_percent=10 size=1048.58k
VarianceKernelDouble/1048576/2            909 us          909 us          741 bytes_per_second=1099.92M/s null_percent=50 size=1048.58k
VarianceKernelDouble/1048576/1          0.831 us        0.831 us       831173 bytes_per_second=1.14779T/s null_percent=100 size=1048.58k
VarianceKernelDouble/1048576/0            174 us          174 us         4050 bytes_per_second=5.62431G/s null_percent=0 size=1048.58k

I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)

cyb70289 · 2020-10-22T02:02:56Z

I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)

There's no int-to-float conversion in Int64 summation loop (sum to Int128). It's faster than double summation.
https://quick-bench.com/q/-P9E6tgtXqnVBpVmmN6piaZHeUA

pitrou · 2020-10-22T08:20:51Z

Ah, I hadn't noticed the SumType.

pitrou

+1

cyb70289 marked this pull request as draft October 15, 2020 05:48

cyb70289 marked this pull request as ready for review October 16, 2020 07:01

pitrou reviewed Oct 19, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/aggregate_var_std.cc Outdated Show resolved Hide resolved

kszucs force-pushed the master branch from 953009f to 04660f8 Compare October 19, 2020 18:00

ARROW-10304: [C++][Compute] Optimize variance kernel for integers

ce6ea45

Improve variance kernel performance for integers by leveraging textbook one pass algorithm and integer arithmetic.

pitrou approved these changes Oct 22, 2020

View reviewed changes

pitrou closed this in e2d8dc3 Oct 22, 2020

cyb70289 deleted the variance-integer branch October 22, 2020 08:59

asfimport mentioned this pull request Oct 22, 2020

[C++][Compute] Optimize variance kernel for integers #26295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-10304: [C++][Compute] Optimize variance kernel for integers #8466

ARROW-10304: [C++][Compute] Optimize variance kernel for integers #8466

Uh oh!

cyb70289 commented Oct 15, 2020 •

edited

Loading

Uh oh!

github-actions bot commented Oct 15, 2020

Uh oh!

cyb70289 commented Oct 15, 2020 •

edited

Loading

Uh oh!

cyb70289 commented Oct 15, 2020 •

edited

Loading

Uh oh!

cyb70289 commented Oct 16, 2020

Uh oh!

Uh oh!

pitrou commented Oct 21, 2020

Uh oh!

cyb70289 commented Oct 22, 2020 •

edited

Loading

Uh oh!

pitrou commented Oct 22, 2020

Uh oh!

pitrou left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARROW-10304: [C++][Compute] Optimize variance kernel for integers #8466

ARROW-10304: [C++][Compute] Optimize variance kernel for integers #8466

Uh oh!

Conversation

cyb70289 commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 15, 2020

Uh oh!

cyb70289 commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyb70289 commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyb70289 commented Oct 16, 2020

Uh oh!

Uh oh!

pitrou commented Oct 21, 2020

Uh oh!

cyb70289 commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Oct 22, 2020

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyb70289 commented Oct 15, 2020 •

edited

Loading

cyb70289 commented Oct 15, 2020 •

edited

Loading

cyb70289 commented Oct 15, 2020 •

edited

Loading

cyb70289 commented Oct 22, 2020 •

edited

Loading