Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Oct 30, 2022

What changes were proposed in this pull request?

This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the conditional expressions, includes:

  1. If (2):
    override def checkInputDataTypes(): TypeCheckResult = {
    if (predicate.dataType != BooleanType) {
    TypeCheckResult.TypeCheckFailure(
    "type of predicate expression in If should be boolean, " +
    s"not ${predicate.dataType.catalogString}")
    } else if (!TypeCoercion.haveSameType(inputTypesForMerging)) {
    TypeCheckResult.TypeCheckFailure(s"differing types in '$sql' " +
  2. CaseWhen (2):
    TypeCheckResult.TypeCheckFailure(
    s"WHEN expressions in CaseWhen should all be boolean type, " +
    s"but the ${index + 1}th when expression's type is ${branches(index)._1}")
    }
    } else {
    val branchesStr = branches.map(_._2.dataType).map(dt => s"WHEN ... THEN ${dt.catalogString}")
    .mkString(" ")
    val elseStr = elseValue.map(expr => s" ELSE ${expr.dataType.catalogString}").getOrElse("")
    TypeCheckResult.TypeCheckFailure(
  3. InSubquery (2):
    TypeCheckResult.TypeCheckFailure(
    s"""
    |The number of columns in the left hand side of an IN subquery does not match the
    |number of columns in the output of subquery.
    |#columns in left hand side: ${values.length}.
    |#columns in right hand side: ${query.childOutputs.length}.
    |Left side columns:
    |[${values.map(_.sql).mkString(", ")}].
    |Right side columns:
    |[${query.childOutputs.map(_.sql).mkString(", ")}].""".stripMargin)
    } else if (!DataType.equalsStructurally(
    query.dataType, value.dataType, ignoreNullability = true)) {
    val mismatchedColumns = values.zip(query.childOutputs).flatMap {
    case (l, r) if l.dataType != r.dataType =>
    Seq(s"(${l.sql}:${l.dataType.catalogString}, ${r.sql}:${r.dataType.catalogString})")
    case _ => None
    }
    TypeCheckResult.TypeCheckFailure(
  4. In (1):
    TypeCheckResult.TypeCheckFailure(s"Arguments must be same type but were: " +

Why are the changes needed?

Migration onto error classes unifies Spark SQL error messages.

Does this PR introduce any user-facing change?

Yes. The PR changes user-facing error messages.

How was this patch tested?

  1. Add new UT
  2. Update existed UT
  3. Pass GA

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@MaxGekk
Copy link
Member

MaxGekk commented Oct 31, 2022

@panbingkun Could you resolve conflicts, please.

.forEach(s ->
Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT)
.contains(s.toLowerCase(Locale.ROOT))));
System.out.println(e.getMessage().toLowerCase(Locale.ROOT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I left the debugging code by mistake.

Comment on lines 85 to 91
Arrays.asList(
"datatype_mismatch.data_diff_types",
"cannot resolve \"(a in (b))\"",
"due to data type mismatch: input to `in` should all be the same type, " +
"but it's [\"int\", \"array<int>\"].").forEach(s ->
Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT)
.contains(s.toLowerCase(Locale.ROOT))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's don't compare error messages. If you cannot call checkError() easily, just check error classes (maybe more if you can).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check:
1.Exception Type -> AnalysisException
2.Error classes.

@panbingkun panbingkun requested a review from MaxGekk November 1, 2022 02:21
.forEach(s ->
Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT)
.contains(s.toLowerCase(Locale.ROOT))));
Assert.assertTrue(e.message().startsWith("[DATATYPE_MISMATCH.DATA_DIFF_TYPES]"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e is an AnalysisException, correct? Can't you just check its errorClass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e is an AnalysisException
image

@panbingkun panbingkun requested a review from MaxGekk November 1, 2022 11:15
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI. @panbingkun Please, re-trigger GAs.

.contains(s.toLowerCase(Locale.ROOT))));
Assert.assertTrue(e.getErrorClass().equals("DATATYPE_MISMATCH.DATA_DIFF_TYPES"));
Map<String, String> messageParameters = new HashMap() {
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, are the inner {} really needed?

@MaxGekk
Copy link
Member

MaxGekk commented Nov 2, 2022

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in 23c54ad Nov 2, 2022
@panbingkun panbingkun deleted the SPARK-40748 branch November 7, 2022 02:11
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…or classes

### What changes were proposed in this pull request?
This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the conditional expressions, includes:

1. If (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L61-L67
2. CaseWhen (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L175-L183
3. InSubquery (2):
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L378-L396
4. In (1):
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L453

### Why are the changes needed?
Migration onto error classes unifies Spark SQL error messages.

### Does this PR introduce _any_ user-facing change?
Yes. The PR changes user-facing error messages.

### How was this patch tested?
1. Add new UT
2. Update existed UT
3. Pass GA

Closes apache#38438 from panbingkun/SPARK-40748.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants