Fully support nested types in cudf::contains#10656
Fully support nested types in cudf::contains#10656rapids-bot[bot] merged 394 commits intorapidsai:branch-22.10from
cudf::contains#10656Conversation
and keep vanilla element comparator public
This comment was marked as off-topic.
This comment was marked as off-topic.
|
Alright, this PR is awakened. Now it has a totally new implementation, which is just around 10 LOC. |
cpp/src/search/contains_nested.cu
Outdated
| null_equality::EQUAL, | ||
| nan_equality::ALL_EQUAL, | ||
| stream, | ||
| mr); |
There was a problem hiding this comment.
Just curious why this detail function returns a device_uvector instead of a column?
Will the result ever be larger than what a column supports?
There was a problem hiding this comment.
That detail::contains(table_view, table_view) function was initially refactored out of semi-join (#11100). Now, it is also used in several other places but the result is only used for temporary computation and never be returned to the user. Thus, device_uvector is enough at this time.
It is still unclear whether we will expose this function to the public.
There was a problem hiding this comment.
I would also say that (especially for internal APIs like this, but occasionally even otherwise) I prefer using a strongly-typed object like a device_uvector rather than the type-erased (and nullable) column. We have more information available and convey it. Even for public APIs I like uvectors when we don't need type erasure or nullability because the returned values are clearly defined.
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
bdice
left a comment
There was a problem hiding this comment.
LGTM. I have only very minor suggestions at this point. Thank you!
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
|
@gpucibot merge |
This extends the
cudf::containsAPI to support nested types (lists + structs) with arbitrarily nested levels. As such,cudf::containswill work with literally any type of input data.In addition, this fixes null handling of
cudf::containswith structs column + struct scalar input when the structs column contains null rows at the top level while the scalar key is valid but all nulls at children levels.Closes: #8965
Depends on:
cudf::contains, renaming and switching parameters role #10802cudf::contains#10997containswith template key types NVIDIA/cuCollections#172pair_containsinstatic_mapandstatic_multimapNVIDIA/cuCollections#173left_semi_anti_join,cudf::contains, and set operations #11037