-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor thrust::[stable_]partition[_copy]
to use cub::DevicePartition
#1435
Conversation
Performance changes look acceptable. Very slight perf improvements for a many 32-bit offset benchmarks. A few, slight degradations of ~1% and 3% for 2^16 items in the cub.bench.partition.if.base[0] Tesla V100-SXM2-32GB
cub.bench.partition.flagged.base[0] Tesla V100-SXM2-32GB
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an elegant solution enabling new functionality with little changes in CUB internals. Great work!
{ | ||
// Element type of the input iterator | ||
using value_t = typename iterator_traits<InputIt>::value_type; | ||
std::size_t num_items = static_cast<std::size_t>(thrust::distance(first, last)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I'm not sure about this cast. Temporary array checks for num_items > 0
. If user accidentally provides last < first, we'll try to allocate size_t max. Do you think leaving num items signed until we reach dispatch is any better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good catch on the negative ranges!
I've decided to check and return early for negative/empty ranges. I was concerned about other inconvenient side effects from negative num_items
, like allocating the temp array with negative size for the inplace
version, and returning the rejected iterator negative by the num items count (i.e., rejected_result + num_items - num_selected
). So, instead we now have:
if(thrust::distance(first, last) <= 0){
return thrust::make_pair(selected_result, rejected_result);
}
and
if(thrust::distance(first, last) <= 0){
return first;
}
Description
Closes #1397
Closes #1383
Changes:
AgentSelectIf
: (1) one for the selected items, (2) one for the rejected items inthrust::[stable_]partition_copy
interfacesthrust::[stable_]partition[_if]
thrust::stable_partition_if
rejected
items for 64-bit offset typesFollow-On Tasks
The following tasks are left for follow-on PRs:
cub::DevicePartition
interfaces is left to Add support for largenum_items
todevice_partition.cuh
#1437DevicePartition::StencilIf
interface and addDevicePartition::{If,Flagged,StencilIf}
overloads that take two distinct iterators #1436Checklist