-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31010][SQL][ML][FOLLOW-UP] Throw exception when use untyped UDF by default #27488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
417860d
b50677f
9b00afc
1a74b4b
6eb3c4b
5b95715
b329bb0
7076293
00ed9c3
a928232
3297976
6969596
0d1601b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2016,6 +2016,14 @@ object SQLConf { | |
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val LEGACY_ALLOW_UNTYPED_SCALA_UDF = | ||
| buildConf("spark.sql.legacy.allowUntypedScalaUDF") | ||
| .internal() | ||
| .doc("When set to true, user is allowed to use org.apache.spark.sql.functions." + | ||
| "udf(f: AnyRef, dataType: DataType). Otherwise, exception will be throw.") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val TRUNCATE_TABLE_IGNORE_PERMISSION_ACL = | ||
| buildConf("spark.sql.truncateTable.ignorePermissionAcl.enabled") | ||
| .internal() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4732,6 +4732,15 @@ object functions { | |
| * @since 2.0.0 | ||
| */ | ||
| def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = { | ||
| if (!SQLConf.get.getConf(SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF)) { | ||
| val errorMsg = "You're using untyped Scala UDF, which does not have the input type " + | ||
| "information. Spark may blindly pass null to the Scala closure with primitive-type " + | ||
| "argument, and the closure will see the default value of the Java type for the null " + | ||
| "argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. " + | ||
| "You could use other typed Scala UDF APIs to avoid this problem, or set " + | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the error message, we should give an example to show how to use the typed Scala UDF for implementing "udf((x: Int) => x, IntegerType)"
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
| s"${SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF.key} to true and use this API with caution." | ||
| throw new AnalysisException(errorMsg) | ||
| } | ||
| SparkUserDefinedFunction(f, dataType, inputSchemas = Nil) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeTagis required for typed UDF when create udf forcreateTransformFunc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, but I think it's better than silent result changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid this breaking change if we know that the type parameter won't be primitive types. cc @srowen @zhengruifeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but this is trading a possible error for a definite error. In light of the recent conversations about not-breaking things, is this wise? (I don't object though.)
Yes, let's restrict this to primitive types. I think Spark ML even uses some UDFs that accept AnyRef or something to work with tuples or triples, IIRC.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a developer API, so I'm wondering if third-party implementations would use primitive type and hit the silent result changing.
I think it's better to ask users to re-compile their Spark application than just telling them that they may hit result changinng.