Cardinality of filter Optimization#13721
Conversation
|
|
|
Nice idea of expression optimization. @shixuan-fan , @hellium01 wondering if you can help with a first round of review? -- This helps with us to enlarge the reviewer pool and get you familiar with more part of code~ Speaking of commit convention, in general in Presto code based we always do rebase and don't do merge. |
shixuan-fan
left a comment
There was a problem hiding this comment.
This is a clever optimization :)
There was a problem hiding this comment.
Maybe we could use getOnlyElement(functionCall.getArguments()) assuming cardinality should only have one argument.
I'm not entirely sure about this because if cardinality argument count ever changes, we will fail the query, and the current code would simply ignore this optimization. Let's see what others think.
There was a problem hiding this comment.
ditto to this. cardinality should always has single argument, if not, we should throw.
There was a problem hiding this comment.
I don't like throwing from the optimizer. This is just opportunistic optimization and be defensive and have the exact pattern that you want to match,
There was a problem hiding this comment.
Theoretically you don't need to check that here. Function resolution would have failed if arguments size is not 1. Agree with @hellium01's earlier point that this should be added to RowExpressionRewriteRuleSet. Also when you do that, please use proper API (add isCardinalityFunction isFilterFunction to StandardFunctionResolution) rather than relying on name (CallExpression's name is only used for display purpose)
...ain/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyArrayOperationsRewriter.java
Outdated
Show resolved
Hide resolved
...ain/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyArrayOperationsRewriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Let's just name this whenClauses.
...ain/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyArrayOperationsRewriter.java
Outdated
Show resolved
Hide resolved
...ain/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyArrayOperationsRewriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This is inputFunction I believe?
There was a problem hiding this comment.
Let's also revamp this method to be in the following layout to facilitate review process, also, let's also rename the variables following the comments in simplifyCardinalityOfFilterComparedToZero:
- create cast (not required if it could be inlined)
- create case statement
- create input function
- create output function
- create reduce function
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
|
Let's rename the commit title and PR title to "Optimize cardinality of filter expression" and put benchmark results in commit message. Also, is there a benchmark class (like |
hellium01
left a comment
There was a problem hiding this comment.
This is a really smart optimization :). However, we probably need to port it to RowExpressionRewriteRuleSet as ExpressionRewriteRuleSet is deprecating...
There was a problem hiding this comment.
Can you rebase and squash the unrelated commits?
There was a problem hiding this comment.
Also, please follow: https://github.com/prestodb/presto/wiki/Presto-Development-Guidelines
If you are using IntelliJ (which I highly recommend), definitely import the style file from here: https://github.com/airlift/codestyle.
There was a problem hiding this comment.
I thought I had the airift code style. I will doublecheck
...in/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyArrayOperations.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
SIMPLIFY_ARRAY_OPERATIONS probably is a little too general? probably OPTIMIZE_FILTER_CARDINALITY will be more specific for user?
There was a problem hiding this comment.
Alternative question to ask: Do you want this flag to guard all future optimization rules that would be added to arrays?
There was a problem hiding this comment.
Alternative question to ask: Do you want this flag to guard all future optimization rules that would be added to arrays?
Yeah - I was thinking about that. It's a tricky balance between having too many flags and controlling what is enabled. Since most of these optimizations are opportunistic and hopefully don't need to be turned off frequently, I thought one is just enough. But i can change it if there is a strong reason.
There was a problem hiding this comment.
Probably indicate it will optimize cardinality(filter(a, x->f(x)) expression in the session description will be more clear to user?
There was a problem hiding this comment.
Most users won't even understand what that means. This flag is only for any unforeseen bugs.
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
ditto to this. cardinality should always has single argument, if not, we should throw.
There was a problem hiding this comment.
You can use VariableAllocator to make sure there is no confict.
There was a problem hiding this comment.
We don't have to worry about this if we are operating on RowExpression :)
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Outdated
Show resolved
Hide resolved
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Outdated
Show resolved
Hide resolved
|
From another perspective, the reason user will write thus probably another solution is to create such functions? |
|
Yes that will be a next step but we have several major pipelines using this construct so this will have immediate benefit. |
This is a tricky one to measure because it's not an operator. We can add a benchmark test that loops through this operation a thousand times but not sure. |
|
@kaikalur : The first optimization rule is definitely good to avoid unnecessary array construction. I am wondering if the second and third optimization rule is necessary? , i.e. In my opinion we should just backport And I am not sure if I understand the fourth optimization rule will be helpful, i.e. With the first optimization rule, we already avoid unnecessary array construction. And even with this rule we still need to go through the whole array. And my intuition is the overhead of |
Technically, anymatch works only for > 0, for = 0 it doesn't short circuit to return false on the first match. |
It will stop on the first time the comparison succeeds so if you have a large array, it will benefit |
|
|
We have |
Too much code, yuck! I would rather have a better short-circuiting reducer. But may be I will back port them for now. |
And also NonMatch just calls AnyMatch! I'm not backporting that. It's not a performant implementation. |
|
We should also add a SQL function for this one! |
rongrong
left a comment
There was a problem hiding this comment.
Looking at the implementation I think it would be much simpler if we add a SQL function and just rewrite it with the SQL function. Maybe you want to switch to work on supporting SQL function as builtin functions instead? 🤣
There was a problem hiding this comment.
Alternative question to ask: Do you want this flag to guard all future optimization rules that would be added to arrays?
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Theoretically you don't need to check that here. Function resolution would have failed if arguments size is not 1. Agree with @hellium01's earlier point that this should be added to RowExpressionRewriteRuleSet. Also when you do that, please use proper API (add isCardinalityFunction isFilterFunction to StandardFunctionResolution) rather than relying on name (CallExpression's name is only used for display purpose)
|
Regarding doing it for RowExpressions - I would say no. The idea is to move these rewrites up the stack to just after parse/resolve and eventually they will be string/template based ones - parse -resovle-(unparse(rewrite)-parse-resolve)* so people get to see the query structure to do more interesting. I'm doing this one for now because it's widely used pattern. |
964421a to
e403600
Compare
e403600 to
6b3c211
Compare
e48d1f3 to
b8f7656
Compare
bbbb1d9 to
e5528c0
Compare
c067cf9 to
d4e6d42
Compare
11de632 to
1a7a3c1
Compare
…atch and any_match.
c6c5a68 to
e06cb87
Compare

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.
If release note is NOT required, use: