Improve performance for `cudf::contains` when searching for a scalar #11202

ttnghia · 2022-07-06T04:42:02Z

The current implementation of cudf::contains(column_view, scalar) uses thrust::find and thrust::any_of (which also calls thrust::find_if under the hood). These thrust APIs were known to have performance regression (NVIDIA/cccl#720).

This PR replaces thrust::find and thrust::any_of in cudf::contains by thrust::count_if, which improves performance significantly.
Benchmarks show that the run time can be reduced as much as 80% after modification, or up to 5X speedup.

Closes #3806.

ttnghia · 2022-07-06T04:43:00Z

Here is the benchmark results, comparing the performance after vs before this PR:

Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
Contains/SearchScalar_AllValid/32768/manual_time                    -0.4414         -0.3216             0             0             0             0
Contains/SearchScalar_AllValid/262144/manual_time                   -0.3719         -0.2666             0             0             0             0
Contains/SearchScalar_AllValid/2097152/manual_time                  -0.4689         -0.4047             0             0             0             0
Contains/SearchScalar_AllValid/16777216/manual_time                 -0.7573         -0.7391             1             0             1             0
Contains/SearchScalar_AllValid/134217728/manual_time                -0.8053         -0.8005             5             1             5             1
Contains/SearchScalar_AllValid/268435456/manual_time                -0.8057         -0.8054            10             2            10             2
Contains/SearchScalar_Nulls/32768/manual_time                       -0.3815         -0.2611             0             0             0             0
Contains/SearchScalar_Nulls/262144/manual_time                      -0.3703         -0.2673             0             0             0             0
Contains/SearchScalar_Nulls/2097152/manual_time                     -0.4685         -0.4092             0             0             0             0
Contains/SearchScalar_Nulls/16777216/manual_time                    -0.7554         -0.7358             1             0             1             0
Contains/SearchScalar_Nulls/134217728/manual_time                   -0.8056         -0.8008             5             1             5             1
Contains/SearchScalar_Nulls/268435456/manual_time                   -0.8099         -0.8044            10             2            10             2

ttnghia · 2022-07-06T04:44:26Z

Here is the original benchmark results:

Before:

-----------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations
-----------------------------------------------------------------------------------------------
Contains/SearchScalar_AllValid/32768/manual_time          0.049 ms        0.068 ms        11579
Contains/SearchScalar_AllValid/262144/manual_time         0.049 ms        0.067 ms        12872
Contains/SearchScalar_AllValid/2097152/manual_time        0.100 ms        0.113 ms         6656
Contains/SearchScalar_AllValid/16777216/manual_time       0.631 ms        0.644 ms         1042
Contains/SearchScalar_AllValid/134217728/manual_time       4.90 ms         4.91 ms          144
Contains/SearchScalar_AllValid/268435456/manual_time       9.80 ms         9.81 ms           71
Contains/SearchScalar_Nulls/32768/manual_time             0.044 ms        0.062 ms        15785
Contains/SearchScalar_Nulls/262144/manual_time            0.049 ms        0.066 ms        13046
Contains/SearchScalar_Nulls/2097152/manual_time           0.101 ms        0.114 ms         6572
Contains/SearchScalar_Nulls/16777216/manual_time          0.637 ms        0.650 ms         1007
Contains/SearchScalar_Nulls/134217728/manual_time          4.91 ms         4.92 ms          134
Contains/SearchScalar_Nulls/268435456/manual_time          9.82 ms         9.83 ms           71

After:

-----------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations
-----------------------------------------------------------------------------------------------
Contains/SearchScalar_AllValid/32768/manual_time          0.028 ms        0.046 ms        25542
Contains/SearchScalar_AllValid/262144/manual_time         0.031 ms        0.049 ms        22731
Contains/SearchScalar_AllValid/2097152/manual_time        0.053 ms        0.067 ms        12746
Contains/SearchScalar_AllValid/16777216/manual_time       0.153 ms        0.168 ms         4571
Contains/SearchScalar_AllValid/134217728/manual_time      0.954 ms        0.980 ms          732
Contains/SearchScalar_AllValid/268435456/manual_time       1.90 ms         1.91 ms          375
Contains/SearchScalar_Nulls/32768/manual_time             0.027 ms        0.046 ms        25510
Contains/SearchScalar_Nulls/262144/manual_time            0.031 ms        0.049 ms        22713
Contains/SearchScalar_Nulls/2097152/manual_time           0.054 ms        0.067 ms        12385
Contains/SearchScalar_Nulls/16777216/manual_time          0.156 ms        0.172 ms         4496
Contains/SearchScalar_Nulls/134217728/manual_time         0.955 ms        0.981 ms          733
Contains/SearchScalar_Nulls/268435456/manual_time          1.87 ms         1.92 ms          375

vuule

Looks good, just a few suggestions.

cpp/src/search/contains_nested.cu

cpp/src/search/contains.cu

vuule · 2022-07-06T14:39:10Z

cpp/benchmarks/search/search.cpp

+BINARY_SEARCH_BENCHMARK_DEFINE(Column_AllValid, false)
+BINARY_SEARCH_BENCHMARK_DEFINE(Column_HasNulls, true)


Not sure if intentional, but AFAICT we don't need separate definitions here. Validity can be another parameter, something like (if CreateRange is available in the version that we use)

->ArgsProduct({ benchmark::CreateRange(100000, 100000000, 10), {0,1} })

Oh I didn't know that there is such way. I'm trying that...

The current Google benchmark version used in cudf doesn't support CreateRange. I've created another PR to update it: #11209

@vuule It seems that we can't upgrade Google benchmark, so unfortunately your suggestion here can't be worked on.

I think we should look at this as an opportunity to convert this benchmark to nvbench. The reason not to allow something like #11209 is to encourage us to switch (which is also important for adjacent initiatives like building dashboards around better benchmark tracking).

cpp/benchmarks/search/search.cpp

cpp/src/search/contains.cu

cpp/src/search/contains_nested.cu

cpp/benchmarks/search/search.cpp

cpp/benchmarks/search/contains.cpp

cpp/src/search/contains.cu

bdice

One suggestion, otherwise LGTM!

ttnghia · 2022-07-08T13:44:55Z

@gpucibot merge

ttnghia added 7 commits July 5, 2022 19:43

Fix warnings

0d4747e

Extract create_table_data function

9c5709c

Run benchmark using random data

a09154a

Use macro in benchmark

fb32f2d

Add more benchmark

4b43d6d

Change benchmark range

1debe3b

Use count_if instead of find

b4b4847

ttnghia added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 6, 2022

ttnghia self-assigned this Jul 6, 2022

ttnghia marked this pull request as ready for review July 6, 2022 04:45

ttnghia requested a review from a team as a code owner July 6, 2022 04:45

ttnghia requested review from GregoryKimball, karthikeyann and vuule July 6, 2022 04:45

This comment was marked as off-topic.

Sign in to view

vuule reviewed Jul 6, 2022

View reviewed changes

Change float to size_type for benchmark

7528ea0

ttnghia mentioned this pull request Jul 6, 2022

Upgrade Google Benchmark #11209

Closed

bdice requested changes Jul 6, 2022

View reviewed changes

ttnghia added 4 commits July 6, 2022 10:07

Remove [[maybe_unused]]

9da73de

Remove separator

b8dc84a

MISC

19c73d3

Change seed from static into dynamic, and add back [[maybe_unused]]

d4d063b

bdice reviewed Jul 6, 2022

View reviewed changes

cpp/benchmarks/search/search.cpp Outdated Show resolved Hide resolved

ttnghia added 4 commits July 6, 2022 10:53

Reverse benchmark

75d5584

Avoid memory round trip

61cd767

Fix typo

0ea3794

Add benchmark

fc9d7bb

ttnghia requested a review from bdice July 6, 2022 18:45

github-actions bot added the CMake CMake build issue label Jul 6, 2022

Add tparam

0223ec7

karthikeyann reviewed Jul 7, 2022

View reviewed changes

cpp/benchmarks/search/contains.cpp Outdated Show resolved Hide resolved

cpp/benchmarks/search/contains.cpp Outdated Show resolved Hide resolved

ttnghia added 2 commits July 7, 2022 06:35

Fix null frequency

3d5acaf

Remove random seed

3d82ba4

karthikeyann approved these changes Jul 7, 2022

View reviewed changes

bdice reviewed Jul 8, 2022

View reviewed changes

cpp/src/search/contains.cu Outdated Show resolved Hide resolved

bdice approved these changes Jul 8, 2022

View reviewed changes

Simplify code

fdfa9f7

rapids-bot bot merged commit 4d4632a into rapidsai:branch-22.08 Jul 8, 2022

ttnghia deleted the fix_contains branch July 8, 2022 15:19

		BINARY_SEARCH_BENCHMARK_DEFINE(Column_AllValid, false)
		BINARY_SEARCH_BENCHMARK_DEFINE(Column_HasNulls, true)

Improve performance for cudf::contains when searching for a scalar #11202

Improve performance for cudf::contains when searching for a scalar #11202

Uh oh!

Conversation

ttnghia commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ttnghia commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ttnghia commented Jul 6, 2022

Before:

After:

Uh oh!

This comment was marked as off-topic.

vuule left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vuule Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

ttnghia Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

ttnghia Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

ttnghia Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

vyasr Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

ttnghia commented Jul 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Improve performance for `cudf::contains` when searching for a scalar #11202

Improve performance for `cudf::contains` when searching for a scalar #11202

ttnghia commented Jul 6, 2022 •

edited

Loading

ttnghia commented Jul 6, 2022 •

edited

Loading