Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Sep 10, 2018

What changes were proposed in this pull request?

This PR can correctly cause assertion failure when incorrect nullable of DataType in the result is generated by a target function to be tested.

Let us think the following example. In the future, a developer would write incorrect code that returns unexpected result. We have to correctly cause fail in this test since valueContainsNull=false while expr includes null. However, without this PR, this test passes. This PR can correctly cause fail.

test("test TARGETFUNCTON") {
  val expr = TARGETMAPFUNCTON()
  // expr = UnsafeMap(3 -> 6, 7 -> null)
  // expr.dataType = (IntegerType, IntegerType, false)

  expected = Map(3 -> 6, 7 -> null)
  checkEvaluation(expr, expected)

In checkEvaluationWithUnsafeProjection, the results are compared using UnsafeRow. When the given expected is converted) to UnsafeRow using the DataType of expr.

val expectedRow = UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit)

In summary, expr is [0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6] with and w/o this PR. expected is converted to

  • w/o this PR, [0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]
  • with this PR, [0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]

As a result, w/o this PR, the test unexpectedly passes.

This is because, w/o this PR, based on given dataType, generated code of projection for expected avoids to set nullbit.

                    // tmpInput_2 is expected
/* 155 */           for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 156 */             mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 157 */           }

With this PR, generated code of projection for expected always checks whether nullbit should be set by isNullAt

                    // tmpInput_2 is expected
/* 161 */           for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 162 */
/* 163 */             if (tmpInput_2.isNullAt(index_1)) {
/* 164 */               mutableStateArray_1[1].setNull4Bytes(index_1);
/* 165 */             } else {
/* 166 */               mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 167 */             }
/* 168 */
/* 169 */           }

How was this patch tested?

Existing UTs

@kiszk kiszk changed the title [WIP][SPARK-25388][Test] Detect incorrect nullable of DataType in the result [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable of DataType in the result Sep 10, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the existing test unsafeRow != expectedRow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it was used for debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok.

@SparkQA
Copy link

SparkQA commented Sep 10, 2018

Test build #95853 has finished for PR 22375 at commit 48e47b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2018

Test build #95861 has finished for PR 22375 at commit 51aa9d5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Sep 10, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2018

Test build #95886 has finished for PR 22375 at commit 51aa9d5.

  • This patch fails from timeout after a configured wait of `400m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Sep 11, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 11, 2018

Test build #95911 has finished for PR 22375 at commit 51aa9d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments explaining it? Not so straightforward.

Copy link
Member Author

@kiszk kiszk Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add some comments from the description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. Do you think what test is preferable?

  • Successfully pass (we already have these UTs since all UTs have been passed)
  • Expectedly failure (In other words, add a function that generated incorrect result)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a test that will be failed without this patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this patch allows us to correctly detect incorrect results, I added a test that was passed w/o this patch and that is failed with this patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes. this is going to detect such failure.

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Test build #96034 has finished for PR 22375 at commit 04a2988.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class MapIncorrectDataTypeExpression() extends LeafExpression with CodegenFallback

@kiszk kiszk changed the title [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable of DataType in the result [SPARK-25388][Test][SQL] Detect incorrect nullable of DataType in the result Sep 13, 2018
@kiszk
Copy link
Member Author

kiszk commented Sep 13, 2018

cc @cloud-fan @mgaido91

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if here you put Map(3 -> 7, 6 -> -1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an output. This is because the test correctly detects a failure even in codegen-off mode, too.

"Incorrect evaluation (codegen off): mapincorrectdatatypeexpression(), actual: keys: [3,6], values: [7,null], expected: keys: [3,6], values: [7,-1]" did not contain "Incorrect evaluation in unsafe mode"
ScalaTestFailureLocation: org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelperSuite$$anonfun$4 at (ExpressionEvalHelperSuite.scala:44)
org.scalatest.exceptions.TestFailedException: "Incorrect evaluation (codegen off): mapincorrectdatatypeexpression(), actual: keys: [3,6], values: [7,null], expected: keys: [3,6], values: [7,-1]" did not contain "Incorrect evaluation in unsafe mode"
	at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the example above passes in codegen off and fails with codegen on with this fix, while using Map(3 -> 7, 6 -> -1) passes codegen on and fails codegen off, am I right?

What I am thinking about (but I have not yet found a working implementation) is: since the problem arise when we say we expect null in a non-nullable datatype, can we add such a check? I mean, instead of pretending the expected value to be nullable, can't we add a check in case it is not nullable for being sure that it does not contain null? I think it would be better, because we would be able to distinguish a failure caused by a bad test, ie. a test written wrongly, from a UT failure caused by a bug in what we are testing. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is summary.

  • Map(3 -> 7, 6 -> null) passes in codegen off and fails in codegen on with this fix
  • Map(3 -> 7, 6 -> -1) fails in codegen off and fails in codegen on

Would it be possible to share examples of two cases that you think we would be able to distinguish?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks for the summary, it states more clearly what I thought.

My point is that this fix works properly only when we test both codegen on and off, but it would fail to detect the error condition it claims to fix if only one of them (for any reason) is tested. So I am wondering if it is possible to perform a check on the expected value, instead of this fix. Something like:

assert(containsNull(expected) && isNullable(expression.dataType))

where containsNull and isNullable have to be defined properly. In this way we should fail properly independently from whether codegen is on or not. And we can also give a more clear hint in the error message about the problem being most likely a bad UT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we make it checking recursively, I think that this case cannot be detected. This is because the mismatch occurs in the different recursive path.

Would it be possible to share the case where we distingished a wrong output from a bad written UT in other places, as you proposed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I said that the suggestion above is wrong and needs to be rewritten in a recursive way. Sorry for the bad suggestion, I just meant to show my idea. So it should be something like:

assert(!containsNullWhereNotNullable(expected, expression.dataType))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not still understand your motivation correctly. What is the motivation to introduce this assertion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivations are the 2 mentioned above. Basically, I am proposing the same suggestion @cloud-fan has just commented here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some hints from @ueshin, this PR implemented the check of null value with nullable bit in checkResult().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more straightforward approach is, validate the expected according to the nullability of the given expression.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96741 has finished for PR 22375 at commit f8d3aeb.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96742 has finished for PR 22375 at commit be28ab3.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96745 has finished for PR 22375 at commit 33e589d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96746 has finished for PR 22375 at commit 9ef335d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Oct 3, 2018

ping @cloud-fan @mgaido91

exprNullable: Boolean): Boolean = {
val dataType = UserDefinedType.sqlType(exprDataType)

assert(result != null || exprNullable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a description which is more clear about the issue? Something like: The result is null for a non-nullable expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add the description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add message to the assert, e.g. assert(result != null || exprNullable, "xxx")

val input = if (inputRow == EmptyRow) "" else s", input: $inputRow"

val dataType = expression.dataType
if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you add this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because this statement checks consistency between expression and its nullable, as you proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh, I am not sure about this. Do we then still need the code below? Seems to me we are checking the same thing twice, please correct me if I am wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check different properties in these two if statements.

  1. Line 231 checks consistency between value and nullable in expected
  2. Line 245 checks bit-wise value between expected and expression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I just meant that here we are checking the result and we are doing the same after too. Shouldn't we just add an assert for unsafeRow.get(0, dataType) != null || expression.nullable here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. sees only expected. 2. sees expected and expression. Thus, we are doing different.

At 1, as we discussed, we need to check the consistency recursively. IIUC, unsafeRow.get(0, dataType) != null || expression.nullable does not perform checks recursively. Do I make a misunderstanding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not perform checks recursively

good point, I was not considering it. Then, do we need the check at https://github.com/apache/spark/pull/22375/files/9ef335d6e43a6ef7d253d0ed3564f95bd0278f71#diff-41747ec3f56901eb7bfb95d2a217e94dL231? Isn't it performed in checkResult?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkResult already validates expression according to the nullability of the given expression at 1. Thus, if the expected is not correct, 2. will detect an incorrect point.

val expected = UTF8String.fromString("abc")

if (!checkResult(actual.head, expected, expressions.head.dataType)) {
if (!checkResult(actual.head, expected, expressions.head.dataType, expressions.head.nullable)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird to ask the caller to provide both expected and exprNullable , and then use exprNullable to validate expected. Can we set a default value for exprNullable in checkResult?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is another option that I thought. On the other hand, to set default has a risk to overlook a possible incosistency between value and nullable at top level of expected.

Do we use the default value at the all of callers of checkResult?

Copy link
Contributor

@cloud-fan cloud-fan Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should provide an overload of checkResult that takes Expression, which provides dataType and nullable

/**
* Check the equality between result of expression and expected value, it will handle
* Array[Byte], Spread[Double], MapData and Row.
* Array[Byte], Spread[Double], MapData and Row. Also check whether exprNullable is true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment doesn't match the method now

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97248 has finished for PR 22375 at commit edc3d7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97275 has finished for PR 22375 at commit 5f84e80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in c9d7d83 Oct 12, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
… result

## What changes were proposed in this pull request?

This PR can correctly cause assertion failure when incorrect nullable of DataType in the result is generated by a target function to be tested.

Let us think the following example. In the future, a developer would write incorrect code that returns unexpected result. We have to correctly cause fail in this test since `valueContainsNull=false` while `expr` includes `null`. However, without this PR, this test passes. This PR can correctly cause fail.

```
test("test TARGETFUNCTON") {
  val expr = TARGETMAPFUNCTON()
  // expr = UnsafeMap(3 -> 6, 7 -> null)
  // expr.dataType = (IntegerType, IntegerType, false)

  expected = Map(3 -> 6, 7 -> null)
  checkEvaluation(expr, expected)
```

In [`checkEvaluationWithUnsafeProjection`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L208-L235), the results are compared using `UnsafeRow`. When the given `expected` is [converted](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L226-L227)) to `UnsafeRow` using the `DataType` of `expr`.
```
val expectedRow = UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit)
```

In summary, `expr` is `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` with and w/o this PR. `expected` is converted to

* w/o  this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]`
* with this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]`

As a result, w/o this PR, the test unexpectedly passes.

This is because, w/o this PR, based on given `dataType`, generated code of projection for `expected` avoids to set nullbit.
```
                    // tmpInput_2 is expected
/* 155 */           for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 156 */             mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 157 */           }
```

With this PR, generated code of projection for `expected` always checks whether nullbit should be set by `isNullAt`
```
                    // tmpInput_2 is expected
/* 161 */           for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 162 */
/* 163 */             if (tmpInput_2.isNullAt(index_1)) {
/* 164 */               mutableStateArray_1[1].setNull4Bytes(index_1);
/* 165 */             } else {
/* 166 */               mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 167 */             }
/* 168 */
/* 169 */           }
```

## How was this patch tested?

Existing UTs

Closes apache#22375 from kiszk/SPARK-25388.

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
maropu pushed a commit that referenced this pull request Aug 21, 2020
…es that can lead to null result

### What changes were proposed in this pull request?

Add document to `ExpressionEvalHelper`, and ask people to explore all the cases that can lead to null results (including null in struct fields, array elements and map values).

This PR also fixes `ComplexTypeSuite.GetArrayStructFields` to explore all the null cases.

### Why are the changes needed?

It happened several times that we hit correctness bugs caused by wrong expression nullability. When writing unit tests, we usually don't test the nullability flag directly, and it's too late to add such tests for all expressions.

In #22375, we extended the expression test framework, which checks the nullability flag when the expected result/field/element is null.

This requires the test cases to explore all the cases that can lead to null results

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

I reverted 5d296ed locally, and `ComplexTypeSuite` can catch the bug.

Closes #29493 from cloud-fan/small.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants