Support set operations#11043
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
jrhemstad
left a comment
There was a problem hiding this comment.
This looks like it has the potential to be a very large PR. I would strongly advocate for breaking this up into smaller, easier to review ones. Likely be doing one of the set_ algorithms at a time.
I'm not sure. If you read my algorithm descriptions (from here) then the implementation for each set-op should be very short. I will try to have smaller dependent PRs if I figured out what to extract from here. Currently, I see some overlap in implementation for this PR and the |
PointKernel
left a comment
There was a problem hiding this comment.
Another round of review focuses on implementations. Unit tests not reviewed yet.
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
bdice
left a comment
There was a problem hiding this comment.
@ttnghia I have some comments attached. I think this is some of your best work I've reviewed. Really great job! The way the set algorithms all fall back to contains is very nice -- I love when we can write concise higher-order algorithms like these.
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
bdice
left a comment
There was a problem hiding this comment.
A few minor suggestions, otherwise LGTM!
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
PointKernel
left a comment
There was a problem hiding this comment.
Looks great!
I've learned a lot by going through your PRs in the chain and here we are finally! Thanks @ttnghia for your persistence and effort put into this work.
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
|
@gpucibot merge |
This PR add Java binding for the set-like operations: * `lists::have_overlap` * `lists::intersect_distinct` * `lists::union_distinct` * `lists::difference_distinct` Depends on: * #11043 * #11220 New Java APIs start here: https://github.com/rapidsai/cudf/pull/11143/files#diff-50ba2711690aca8e4f28d7b491373a4dd76443127c8b452a77b6c1fe2388d9e3R3545 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #11143
This PR adds the following APIs for set operations:
lists::have_overlaplists::intersect_distinctlists::union_distinctlists::difference_distinctName Convention
Except for the first API (
lists::have_overlap) that returns a boolean column, the suffix_distinctof the rest APIs denotes that their results will be lists columns in which all list rows have been post-processed to remove duplicates. As such, their results are actually "set" columns in which each row is a "set" of distinct elements.Depends on:
static_multimap::pair_containsNVIDIA/cuCollections#175duplicate_keep_optionincudf::distinct#11052nan_equalityincudf::distinct#11118semi_anti_join#11100lists::distinctandcudf::detail::stable_distinct#11149Closes #10409.