Skip to content

Conversation

@gatorsmile
Copy link
Member

This PR is to backport #22702 to branch 2.2.


What changes were proposed in this pull request?

    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()

Before the PR, it returns both rows. After the fix, it returns Row ("abc", 1)). This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

How was this patch tested?

Added test cases

@SparkQA
Copy link

SparkQA commented Oct 14, 2018

Test build #97357 has started for PR 22719 at commit be7a236.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 14, 2018

Test build #97364 has finished for PR 22719 at commit be7a236.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to 2.2!

asfgit pushed a commit that referenced this pull request Oct 16, 2018
…ooleanSimplification

This PR is to backport #22702 to branch 2.2.

---

## What changes were proposed in this pull request?
```Scala
    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```

Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

## How was this patch tested?
Added test cases

Closes #22719 from gatorsmile/cherrypickSpark-257142.2.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@gatorsmile gatorsmile closed this Oct 16, 2018
Willymontaz pushed a commit to criteo-forks/spark that referenced this pull request Sep 26, 2019
…ooleanSimplification

This PR is to backport apache#22702 to branch 2.2.

---

## What changes were proposed in this pull request?
```Scala
    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```

Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

## How was this patch tested?
Added test cases

Closes apache#22719 from gatorsmile/cherrypickSpark-257142.2.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Willymontaz pushed a commit to criteo-forks/spark that referenced this pull request Sep 27, 2019
…ooleanSimplification

This PR is to backport apache#22702 to branch 2.2.

---

## What changes were proposed in this pull request?
```Scala
    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```

Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

## How was this patch tested?
Added test cases

Closes apache#22719 from gatorsmile/cherrypickSpark-257142.2.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants