-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullWithFalseInPredicate #23139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * `Literal(null, BooleanType)`. | ||
| */ | ||
| private def replaceNullWithFalse(e: Expression): Expression = { | ||
| if (e.dataType != BooleanType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? And, Or, If all return boolean, and we already requires boolean type for literal case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the LambdaFunction? My major concern is the future changes might forget to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle LambdaFunction inside this method, it's caller side.
This method is to deal with optimizable boolean expressions, and return the original expression if it's not: https://github.com/apache/spark/pull/23139/files#diff-0bb4fc0a3c867b855f84dd1db8867139R103
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My major concern is we should not completely rely on the caller to ensure the data type is Boolean. In the future, the new code changes might not completely follow our current assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had an offline discussion with @rednaxelafx . We can issue an exception instead of silently bypass it.
|
Test build #99255 has finished for PR 23139 at commit
|
|
Test build #99262 has finished for PR 23139 at commit
|
| And(replaceNullWithFalse(left), replaceNullWithFalse(right)) | ||
| case Or(left, right) => | ||
| Or(replaceNullWithFalse(left), replaceNullWithFalse(right)) | ||
| case Literal(null, _) => FalseLiteral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this line? What happened if the input data type of e is not Boolean?
|
retest this please |
|
Test build #99270 has finished for PR 23139 at commit
|
|
Although we are trying to make sure in the caller side to only call |
| val message = "Expected a Boolean type expression in replaceNullWithFalse, " + | ||
| s"but got the type `${e.dataType.catalogString}` in `${e.sql}`." | ||
| if (Utils.isTesting) { | ||
| throw new IllegalArgumentException(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for this? Why not also throw exception in runtime since this should never be hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests might not catch all the cases if the test coverage is not complete. Such an exception should not block the query execution. Thus, we just throw an exception in our testing mode instead of the production mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair.
|
Test build #99283 has finished for PR 23139 at commit
|
|
LGTM. |
|
|
||
| /** | ||
| * A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, if possible, in the search | ||
| * condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scope of this rule is a bit bigger. For example, some higher-order functions, conditions of all If and CaseWhen expressions. Would it make sense to replace "in the search condition of the WHERE/HAVING/ON(JOIN) clauses" with "in predicates"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra scope is covered by "Moreover, ..."
| /** | ||
| * A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, if possible, in the search | ||
| * condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator | ||
| * "(search condition) = TRUE". The replacement is only valid when `Literal(null, BooleanType)` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand "which contain an implicit Boolean operator "(search condition) = TRUE"". Could you, please, elaborate a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the ANSI SQL. All these clauses have the implicit Boolean operator "(search condition) = TRUE". That is why NULL and FALSE do not satisfy the condition in these clauses
|
Test build #99294 has finished for PR 23139 at commit
|
|
Thanks. Merged into master. |
…icate ## What changes were proposed in this pull request? Based on apache#22857 and apache#23079, this PR did a few updates - Limit the data types of NULL to Boolean. - Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode. - Create a new file for the rule ReplaceNullWithFalseInPredicate - Update the description of this rule. ## How was this patch tested? Added a test case Closes apache#23139 from gatorsmile/followupSpark-25860. Authored-by: gatorsmile <[email protected]> Signed-off-by: DB Tsai <[email protected]>
What changes were proposed in this pull request?
Based on #22857 and #23079, this PR did a few updates
How was this patch tested?
Added a test case