diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 764d63459aba8..2cfe8d6147b4e 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -81,7 +81,7 @@ license: | - In Spark version 2.4 and below, you can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, for example, map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. In Spark 3.0, Spark throws `RuntimeException` when duplicated keys are found. You can set `spark.sql.mapKeyDedupPolicy` to `LAST_WIN` to deduplicate map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (for example, Parquet), the behavior is undefined. - - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)` is not allowed by default. Set `spark.sql.legacy.allowUntypedScalaUDF` to true to keep using it. In Spark version 2.4 and below, if `org.apache.spark.sql.functions.udf(AnyRef, DataType)` gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, `val f = udf((x: Int) => x, IntegerType)`, `f($"x")` returns null in Spark 2.4 and below if column `x` is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default. + - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)` is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set `spark.sql.legacy.allowUntypedScalaUDF` to true to keep using it. In Spark version 2.4 and below, if `org.apache.spark.sql.functions.udf(AnyRef, DataType)` gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, `val f = udf((x: Int) => x, IntegerType)`, `f($"x")` returns null in Spark 2.4 and below if column `x` is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default. - In Spark 3.0, a higher-order function `exists` follows the three-valued boolean logic, that is, if the `predicate` returns any `null`s and no `true` is obtained, then `exists` returns `null` instead of `false`. For example, `exists(array(1, null, 3), x -> x % 2 == 0)` is `null`. The previous behaviorcan be restored by setting `spark.sql.legacy.followThreeValuedLogicInArrayExists` to `false`. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index 653e1a739aaf1..7b4a35858744a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -4856,8 +4856,8 @@ object functions { * @group udf_funcs * @since 2.0.0 */ - @deprecated("Untyped Scala UDF API is deprecated, please use typed Scala UDF API such as " + - "'def udf[RT: TypeTag](f: Function0[RT]): UserDefinedFunction' instead.", "3.0.0") + @deprecated("Scala `udf` method with return type parameter is deprecated. " + + "Please use Scala `udf` method without return type parameter.", "3.0.0") def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = { if (!SQLConf.get.getConf(SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF)) { val errorMsg = "You're using untyped Scala UDF, which does not have the input type " + @@ -4865,7 +4865,7 @@ object functions { "argument, and the closure will see the default value of the Java type for the null " + "argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. " + "To get rid of this error, you could:\n" + - "1. use typed Scala UDF APIs, e.g. `udf((x: Int) => x)`\n" + + "1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`\n" + "2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { " + "override def call(s: String): Integer = s.length() }, IntegerType)`, " + "if input types are all non primitive\n" +