Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Nov 26, 2018

What changes were proposed in this pull request?

Based on #22857 and #23079, this PR did a few updates

  • Limit the data types of NULL to Boolean.
  • Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode.
  • Create a new file for the rule ReplaceNullWithFalseInPredicate
  • Update the description of this rule.

How was this patch tested?

Added a test case

@gatorsmile
Copy link
Member Author

* `Literal(null, BooleanType)`.
*/
private def replaceNullWithFalse(e: Expression): Expression = {
if (e.dataType != BooleanType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? And, Or, If all return boolean, and we already requires boolean type for literal case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the LambdaFunction? My major concern is the future changes might forget to add it?

Copy link
Contributor

@cloud-fan cloud-fan Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle LambdaFunction inside this method, it's caller side.

This method is to deal with optimizable boolean expressions, and return the original expression if it's not: https://github.com/apache/spark/pull/23139/files#diff-0bb4fc0a3c867b855f84dd1db8867139R103

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this line https://github.com/apache/spark/pull/23139/files/e41681096867cbc6d2556da83ce733092d6df841#diff-a1acb054bc8888376603ef510e6d0ee0

My major concern is we should not completely rely on the caller to ensure the data type is Boolean. In the future, the new code changes might not completely follow our current assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had an offline discussion with @rednaxelafx . We can issue an exception instead of silently bypass it.

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99255 has finished for PR 23139 at commit 6b6997d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99262 has finished for PR 23139 at commit e416810.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

And(replaceNullWithFalse(left), replaceNullWithFalse(right))
case Or(left, right) =>
Or(replaceNullWithFalse(left), replaceNullWithFalse(right))
case Literal(null, _) => FalseLiteral
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this line? What happened if the input data type of e is not Boolean?

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99270 has finished for PR 23139 at commit e416810.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Nov 26, 2018

Although we are trying to make sure in the caller side to only call replaceNullWithFalse when the expression is boolean type, I agree that for safety, we should check it and throw exception for future development.

val message = "Expected a Boolean type expression in replaceNullWithFalse, " +
s"but got the type `${e.dataType.catalogString}` in `${e.sql}`."
if (Utils.isTesting) {
throw new IllegalArgumentException(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for this? Why not also throw exception in runtime since this should never be hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests might not catch all the cases if the test coverage is not complete. Such an exception should not block the query execution. Thus, we just throw an exception in our testing mode instead of the production mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair.

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99283 has finished for PR 23139 at commit 3501420.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Nov 26, 2018

LGTM.


/**
* A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, if possible, in the search
* condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator
Copy link
Contributor

@aokolnychyi aokolnychyi Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scope of this rule is a bit bigger. For example, some higher-order functions, conditions of all If and CaseWhen expressions. Would it make sense to replace "in the search condition of the WHERE/HAVING/ON(JOIN) clauses" with "in predicates"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra scope is covered by "Moreover, ..."

/**
* A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, if possible, in the search
* condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator
* "(search condition) = TRUE". The replacement is only valid when `Literal(null, BooleanType)` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand "which contain an implicit Boolean operator "(search condition) = TRUE"". Could you, please, elaborate a bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the ANSI SQL. All these clauses have the implicit Boolean operator "(search condition) = TRUE". That is why NULL and FALSE do not satisfy the condition in these clauses

@SparkQA
Copy link

SparkQA commented Nov 27, 2018

Test build #99294 has finished for PR 23139 at commit 8b0401c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Nov 27, 2018

Thanks. Merged into master.

@asfgit asfgit closed this in 85383d2 Nov 27, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…icate

## What changes were proposed in this pull request?

Based on apache#22857 and apache#23079, this PR did a few updates

- Limit the data types of NULL to Boolean.
- Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode.
- Create a new file for the rule ReplaceNullWithFalseInPredicate
- Update the description of this rule.

## How was this patch tested?
Added a test case

Closes apache#23139 from gatorsmile/followupSpark-25860.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants