-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize distinct inner join to use set find
instead of retrieve
#17278
Optimize distinct inner join to use set find
instead of retrieve
#17278
Conversation
Performance comparison with RTX8000
|
Thank you for posting the benchmark results! |
Valid concern. In the worst case, the slowdown may reach up to 20%, but the additional runtime is only a few microseconds. On the other hand, in cases the new implementations perform better, the speedups are well over 10 microseconds in most cases, so I believe the overall optimization is still worthwhile.
Good question. It's not super obvious from the performance results but the new implementation outperforms the previous one in most cases, except when dealing with small data, such as when both the left and right tables contain no more than 10'000 rows of integers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
How does retrieve_all
compare to retrieve
and find
?
is retrieve_all
slower than find
?
Good question. The two methods use different algorithms. |
/merge |
@abellina have you observed any performance impacts from this change in Spark-RAPIDS? |
Description
This PR introduces a minor optimization for distinct inner joins by using the
find
results to selectively copy matches to the output. This approach eliminates the need for the costlyretrieve
operation, which relies on expensive atomic operations.Checklist