Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,12 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
// We can return what the children return. Same thing should happen in the codegen path.
if (DataType.equalsStructurally(from, to)) {
identity
} else if (from == NullType) {
// According to `canCast`, NullType can be casted to any type.
// For primitive types, we don't reach here because the guard of `nullSafeEval`.
// But for nested types like struct, we might reach here for nested null type field.
// We won't call the returned function actually, but returns a placeholder.
_ => throw new SparkException(s"should not directly cast from NullType to $to.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we hit this exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of casting a nested field in a struct or array, if we find it is a null, we don't call the cast function but set the destination as null directly. If we call this cast function, we hit this exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do the thing as follow:

buildCast[InternalRow](_, row => {
  val newRow = new GenericInternalRow(from.fields.length)
  var i = 0
  while (i < row.numFields) {
    newRow.update(i,
      // We don't call cast function but directly set a null, if finding a null.
      if (row.isNullAt(i)) null else castFuncs(i)(row.get(i, from.apply(i).dataType)))
    i += 1
  }
  newRow
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means, it's a bug if the exception is thrown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reaching here, means we call the cast function for a null, unnecessarily.

} else {
to match {
case dt if dt == from => identity[Any]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -990,4 +990,19 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
}

test("SPARK-27671: cast from nested null type in struct") {
import DataTypeTestUtils._

atomicTypes.foreach { atomicType =>
val struct = Literal.create(
InternalRow(null),
StructType(Seq(StructField("a", NullType, nullable = true))))

val ret = cast(struct, StructType(Seq(
StructField("a", atomicType, nullable = true))))
assert(ret.resolved)
checkEvaluation(ret, InternalRow(null))
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2157,4 +2157,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
|*(1) Range (0, 10, step=1, splits=2)""".stripMargin))
}
}

test("SPARK-27671: Fix analysis exception when casting null in nested field in struct") {
val df = sql("SELECT * FROM VALUES (('a', (10, null))), (('b', (10, 50))), " +
"(('c', null)) AS tab(x, y)")
checkAnswer(df, Row("a", Row(10, null)) :: Row("b", Row(10, 50)) :: Row("c", null) :: Nil)

val cast = sql("SELECT cast(struct(1, null) AS struct<a:int,b:int>)")
checkAnswer(cast, Row(Row(1, null)) :: Nil)
}
}