-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve performance for cudf::contains when searching for a scalar
#11202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here is the benchmark results, comparing the performance after vs before this PR: |
|
Here is the original benchmark results: Before:After: |
This comment was marked as off-topic.
This comment was marked as off-topic.
vuule
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few suggestions.
cpp/benchmarks/search/search.cpp
Outdated
| BINARY_SEARCH_BENCHMARK_DEFINE(Column_AllValid, false) | ||
| BINARY_SEARCH_BENCHMARK_DEFINE(Column_HasNulls, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if intentional, but AFAICT we don't need separate definitions here. Validity can be another parameter, something like (if CreateRange is available in the version that we use)
->ArgsProduct({
benchmark::CreateRange(100000, 100000000, 10),
{0,1}
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't know that there is such way. I'm trying that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Google benchmark version used in cudf doesn't support CreateRange. I've created another PR to update it: #11209
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuule It seems that we can't upgrade Google benchmark, so unfortunately your suggestion here can't be worked on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should look at this as an opportunity to convert this benchmark to nvbench. The reason not to allow something like #11209 is to encourage us to switch (which is also important for adjacent initiatives like building dashboards around better benchmark tracking).
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, otherwise LGTM!
|
@gpucibot merge |
The current implementation of
cudf::contains(column_view, scalar)usesthrust::findandthrust::any_of(which also callsthrust::find_ifunder the hood). These thrust APIs were known to have performance regression (NVIDIA/cccl#720).This PR replaces
thrust::findandthrust::any_ofincudf::containsbythrust::count_if, which improves performance significantly.Benchmarks show that the run time can be reduced as much as 80% after modification, or up to 5X speedup.
Closes #3806.