[SPARK-25714] [BACKPORT-2.2] Fix Null Handling in the Optimizer rule BooleanSimplification #22719

gatorsmile · 2018-10-14T05:10:49Z

This PR is to backport #22702 to branch 2.2.

What changes were proposed in this pull request?

    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()

Before the PR, it returns both rows. After the fix, it returns Row ("abc", 1)). This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

How was this patch tested?

Added test cases

SparkQA · 2018-10-14T05:13:11Z

Test build #97357 has started for PR 22719 at commit be7a236.

dongjoon-hyun · 2018-10-14T19:39:57Z

Retest this please.

SparkQA · 2018-10-14T22:08:20Z

Test build #97364 has finished for PR 22719 at commit be7a236.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-15T18:34:01Z

cc @cloud-fan

cloud-fan · 2018-10-16T01:50:52Z

thanks, merging to 2.2!

…ooleanSimplification This PR is to backport #22702 to branch 2.2. --- ## What changes were proposed in this pull request? ```Scala val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2") df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1") val df2 = spark.read.parquet("/tmp/test1") df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show() ``` Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release. ## How was this patch tested? Added test cases Closes #22719 from gatorsmile/cherrypickSpark-257142.2. Authored-by: gatorsmile <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ooleanSimplification This PR is to backport apache#22702 to branch 2.2. --- ## What changes were proposed in this pull request? ```Scala val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2") df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1") val df2 = spark.read.parquet("/tmp/test1") df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show() ``` Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release. ## How was this patch tested? Added test cases Closes apache#22719 from gatorsmile/cherrypickSpark-257142.2. Authored-by: gatorsmile <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

fix.

be7a236

gatorsmile closed this Oct 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25714] [BACKPORT-2.2] Fix Null Handling in the Optimizer rule BooleanSimplification #22719

[SPARK-25714] [BACKPORT-2.2] Fix Null Handling in the Optimizer rule BooleanSimplification #22719

Uh oh!

gatorsmile commented Oct 14, 2018

Uh oh!

SparkQA commented Oct 14, 2018

Uh oh!

dongjoon-hyun commented Oct 14, 2018

Uh oh!

SparkQA commented Oct 14, 2018

Uh oh!

gatorsmile commented Oct 15, 2018

Uh oh!

cloud-fan commented Oct 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25714] [BACKPORT-2.2] Fix Null Handling in the Optimizer rule BooleanSimplification #22719

[SPARK-25714] [BACKPORT-2.2] Fix Null Handling in the Optimizer rule BooleanSimplification #22719

Uh oh!

Conversation

gatorsmile commented Oct 14, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 14, 2018

Uh oh!

dongjoon-hyun commented Oct 14, 2018

Uh oh!

SparkQA commented Oct 14, 2018

Uh oh!

gatorsmile commented Oct 15, 2018

Uh oh!

cloud-fan commented Oct 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants