-
Notifications
You must be signed in to change notification settings - Fork 1k
Struct binary comparison op functionality for spark rapids #11153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct binary comparison op functionality for spark rapids #11153
Conversation
bdice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with suggestions after discussing offline with @rwlee. Thanks!
cpp/src/binaryop/binaryop.cpp
Outdated
| scalar const& s, | ||
| rmm::cuda_stream_view stream, | ||
| rmm::mr::device_memory_resource* mr) | ||
| rmm::device_buffer scalar_col_bitmask_and(column_view const& col, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a subtle distinction between valid masks and bitmasks. This case is a valid mask. "Bitmasks" are never nullptr but "valid masks" might not be allocated (just nullptr). This checks if the column is nullable so it's dealing with valid masks.
See past refactoring here: #9588
| template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator> | ||
| struct device_comparison_functor { | ||
| using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>; | ||
| device_comparison_functor(optional_iterator const& optional_iter, | ||
| bool const is_lhs_scalar, | ||
| bool const is_rhs_scalar, | ||
| DeviceComparator const& comparator) | ||
| : _optional_iter(optional_iter), | ||
| _is_lhs_scalar(is_lhs_scalar), | ||
| _is_rhs_scalar(is_rhs_scalar), | ||
| _comparator(comparator){}; | ||
|
|
||
| bool __device__ operator()(size_type i) | ||
| { | ||
| return _optional_iter[i].has_value() && | ||
| _comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i}, | ||
| cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i}); | ||
| } | ||
|
|
||
| optional_iterator const _optional_iter; | ||
| bool const _is_lhs_scalar; | ||
| bool const _is_rhs_scalar; | ||
| DeviceComparator const _comparator; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the comment about this adding net lines of code: it's fine to skip the constructor and use list initialization since the parameters are fairly self-contained.
Also this doesn't need so many template parameters, you can just write template <typename OptionalIterator> and not define using optional_iterator = ...;.
Finally, don't take iterators by reference. They're cheap to copy. See #11084. (But this is moot if we remove the constructor.)
I would try this:
| template <typename DeviceComparator, class AdaptableUnaryFunction, class Iterator> | |
| struct device_comparison_functor { | |
| using optional_iterator = thrust::transform_iterator<AdaptableUnaryFunction, Iterator>; | |
| device_comparison_functor(optional_iterator const& optional_iter, | |
| bool const is_lhs_scalar, | |
| bool const is_rhs_scalar, | |
| DeviceComparator const& comparator) | |
| : _optional_iter(optional_iter), | |
| _is_lhs_scalar(is_lhs_scalar), | |
| _is_rhs_scalar(is_rhs_scalar), | |
| _comparator(comparator){}; | |
| bool __device__ operator()(size_type i) | |
| { | |
| return _optional_iter[i].has_value() && | |
| _comparator(cudf::experimental::row::lhs_index_type{_is_lhs_scalar ? 0 : i}, | |
| cudf::experimental::row::rhs_index_type{_is_rhs_scalar ? 0 : i}); | |
| } | |
| optional_iterator const _optional_iter; | |
| bool const _is_lhs_scalar; | |
| bool const _is_rhs_scalar; | |
| DeviceComparator const _comparator; | |
| }; | |
| template <typename OptionalIterator, typename DeviceComparator> | |
| struct device_comparison_functor { | |
| bool __device__ operator()(size_type i) | |
| { | |
| return optional_iter[i].has_value() && | |
| comparator(cudf::experimental::row::lhs_index_type{is_lhs_scalar ? 0 : i}, | |
| cudf::experimental::row::rhs_index_type{is_rhs_scalar ? 0 : i}); | |
| } | |
| OptionalIterator const optional_iter; | |
| bool const is_lhs_scalar; | |
| bool const is_rhs_scalar; | |
| DeviceComparator const comparator; | |
| }; |
No code changes should be needed at the calling sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: If you have public member variables then you don't need the constructor and can directly use initializer list for instantiating the struct. If the member variables are private then you must have a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we use struct instead of class for functors like this where we expect to use list initialization (and no constructor). The members of a struct are public by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tried this originally (implicit constructor from initializer list), but it has compilation failures.
/home/ryanlee/dev/rapids/cudf/cpp/src/binaryop/compiled/struct_binary_ops.cuh(88): error: no instance of constructor "device_comparison_functor" matches the argument list
argument types are: (thrust::transform_iterator<cudf::detail::optional_accessor<__nv_bool, cudf::nullate::DYNAMIC>, cudf::column_device_view::count_it, thrust::use_default, thrust::use_default>, __nv_bool, __nv_bool, cudf::experimental::row::lexicographic::less_equivalent_comparator<cudf::experimental::row::lexicographic::strong_index_comparator_adapter<cudf::experimental::row::lexicographic::device_row_comparator<cudf::nullate::DYNAMIC, cudf::experimental::row::lexicographic::sorting_physical_element_comparator>>>)
ttnghia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great except that there are a few minor issues mentioned in my comments and one more thing that is desired: using const qualifier for variables as much as possible.
Co-authored-by: Nghia Truong <[email protected]> Co-authored-by: Bradley Dice <[email protected]>
|
@gpucibot merge |
Adds support for Spark's null aware equality binop and expands/improves Java testing for struct binops. Properly tests null structs and full operator testing coverage. Utilizes existing Spark struct binop support with JNI changes to force the full null-aware comparison. Expands on #11153 Partial solution to #8964 -- `NULL_MAX` and `NULL_MIN` still outstanding. Authors: - Ryan Lee (https://github.com/rwlee) Approvers: - Tobias Ribizel (https://github.com/upsj) - Vukasin Milovanovic (https://github.com/vuule) - Jason Lowe (https://github.com/jlowe) URL: #11520
Replaces #9452
Takes the core cudf work from 9452 and integrates it into the JNI while removing struct support integration into cudf's binary operator API. Includes scalar comparison support in addition to vector to vector comparison support.
Follow on needed
Add null_equal operator support