-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9403][SQL][WIP] codeGen in / inSet #7778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin Is there a better way to expose hset to the codeGen stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work when you have multiple queries. lemme see ...
|
Can you beef up the test cases for in and inset in predicatesuites to include all primitive data types? |
|
actually now i think about it more, in most cases it's probably a lot faster to just use the generated code for IN compared with a hashset lookup. In the case of a hashset lookup, it might make sense to change the OptimizeIn rule to only convert In to InSet if the number of instances is greater than say 50. |
|
As for passing in the hashset, one way to do it is to relax the type constraint of references in codegenerator, and allow passing in arbitrary objects. |
|
Jenkins, ok to test. |
|
Jenkins, add to whitelist. |
|
Test build #1249 has finished for PR 7778 at commit
|
This continues tarekauel's work in #7778. Author: Liang-Chi Hsieh <[email protected]> Author: Tarek Auel <[email protected]> Closes #7893 from viirya/codegen_in and squashes the following commits: 81ff97b [Liang-Chi Hsieh] For comments. 47761c6 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in cf4bf41 [Liang-Chi Hsieh] For comments. f532b3c [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in 446bbcd [Liang-Chi Hsieh] Fix bug. b3d0ab4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in 4610eff [Liang-Chi Hsieh] Relax the types of references and update optimizer test. 224f18e [Liang-Chi Hsieh] Beef up the test cases for In and InSet to include all primitive data types. 86dc8aa [Liang-Chi Hsieh] Only convert In to InSet when the number of items in set is more than the threshold. b7ded7e [Tarek Auel] [SPARK-9403][SQL] codeGen in / inSet (cherry picked from commit e1e0587) Signed-off-by: Davies Liu <[email protected]>
This continues tarekauel's work in #7778. Author: Liang-Chi Hsieh <[email protected]> Author: Tarek Auel <[email protected]> Closes #7893 from viirya/codegen_in and squashes the following commits: 81ff97b [Liang-Chi Hsieh] For comments. 47761c6 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in cf4bf41 [Liang-Chi Hsieh] For comments. f532b3c [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in 446bbcd [Liang-Chi Hsieh] Fix bug. b3d0ab4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into codegen_in 4610eff [Liang-Chi Hsieh] Relax the types of references and update optimizer test. 224f18e [Liang-Chi Hsieh] Beef up the test cases for In and InSet to include all primitive data types. 86dc8aa [Liang-Chi Hsieh] Only convert In to InSet when the number of items in set is more than the threshold. b7ded7e [Tarek Auel] [SPARK-9403][SQL] codeGen in / inSet
Jira: https://issues.apache.org/jira/browse/SPARK-9403
@rxin ping