-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number error #29054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #125474 has started for PR 29054 at commit |
|
retest this please |
| import org.apache.spark.unsafe.UnsafeAlignedOffset | ||
|
|
||
|
|
||
| class ScalaAggregateFunction(schema: StructType) extends UserDefinedAggregateFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu why do we need to move classes? Let's separate refactoring and the fixes when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu why do we need to move classes? Let's separate refactoring and the fixes when possible.
Make a mistake, change back, thanks
| } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) { | ||
| udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)) | ||
| udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types. | ||
| Try(super.makeFunctionExpression(name, clazz, input)) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu Seems we don't need to change getOrElse to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu Seems we don't need to change
getOrElsetomatch.
Confused, with getOrElse how can I get the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, how about using try instead of Try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try is the way as in the guideline. But in general I would like to avoid changing the indentation for unrelated codes and make it difficult to see what's the real diff. Such pattern makes it more difficult to backport and revert. Let's avoid this @AngersZhuuuu next time. The actual diff seems just adding more message in the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryis the way as in the guideline. But in general I would like to avoid changing the indentation for unrelated codes and make it difficult to see what's the real diff. Such pattern makes it more difficult to backport and revert. Let's avoid this @AngersZhuuuu next time. The actual diff seems just adding more message in the exception.
Since I always backport pr, I know the point you mentioned, I will notice this more carefully.
Yea, the actual diff is to show more message but since we need to know the exception in return Failure(exception), so we can't use getOrElse()
88bf5d7 to
5dd3169
Compare
| val functionClass = "org.apache.spark.sql.hive.execution.LongProductSum" | ||
| withUserDefinedFunction(functionName -> true) { | ||
| sql(s"CREATE TEMPORARY FUNCTION $functionName AS '$functionClass'") | ||
| val e1 = intercept[AnalysisException] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: e1 -> e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
e1->e
Done
|
Please describe exception messages before/after this PR in the description. |
|
Test build #125532 has finished for PR 29054 at commit
|
|
Test build #125550 has finished for PR 29054 at commit
|
|
retest this please |
Done |
We need the number |
For me , I prefer more clear information let user know it's wrong form don't have relation about |
|
Test build #125595 has finished for PR 29054 at commit
|
| } | ||
| } | ||
|
|
||
| test("Hive mode use spark udaf should show error") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a JIRA prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a JIRA prefix.
Done
|
Test build #125592 has finished for PR 29054 at commit
|
|
Test build #125601 has finished for PR 29054 at commit
|
|
Test build #125620 has finished for PR 29054 at commit
|
But, the relationship of |
Yea, when we use Spark SQL in support hive mode, the real SessionCatalog is HiveSessionCatalog, when we create and use UDAF, it call So in my test case, it's follow spark's UDAF definition, but But what we real need know is These two error is from different level 1. Spark UDAF 2. Hive UDAF. |
|
Test build #125681 has finished for PR 29054 at commit
|
|
cc @HyukjinKwon @maropu Any update? |
| // Current thread context classloader may not be the one loaded the class. Need to switch | ||
| // context classloader to initialize instance properly. | ||
| Utils.withContextClassLoader(clazz.getClassLoader) { | ||
| Try(super.makeFunctionExpression(name, clazz, input)).getOrElse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu, can you get rid of this unrelated diffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu, can you get rid of this unrelated diffs?
How about current change, it won't change indentation. cc @maropu
|
Kubernetes integration test status success |
|
Test build #129217 has finished for PR 29054 at commit
|
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InvalidUDFClassException.scala
Outdated
Show resolved
Hide resolved
| throw new AnalysisException(s"Invalid number of arguments for function $name. " + | ||
| s"Expected: ${e.inputTypes.size}; Found: ${input.size}") | ||
| throw new AnalysisException(s"Invalid number of arguments for " + | ||
| s"function $name. Expected: ${e.inputTypes.size}; Found: ${input.size}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
Done
| // If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct | ||
| // Hive UDF/UDAF/UDTF with function definition. | ||
| makeHiveFunctionExpression(name, clazz, input) | ||
| case e: AnalysisException => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: case e => throw e
The only exception is InvalidUDFClassException, where we need to try with Hive UDF class. For other exceptions, just re-throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
case e => throw eThe only exception is
InvalidUDFClassException, where we need to try with Hive UDF class. For other exceptions, just re-throw.
Done
| parser, | ||
| functionResourceLoader) { | ||
|
|
||
| def makeHiveFunctionExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
Done
|
Test build #129240 has finished for PR 29054 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
retest this please |
|
Kubernetes integration test status failure |
|
retest this please |
|
Test build #129231 has finished for PR 29054 at commit
|
|
Test build #129244 has finished for PR 29054 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129239 has finished for PR 29054 at commit
|
| * Thrown when a query failed for invalid function class, usually because a SQL | ||
| * function's class does not follow the rules of the UDF/UDAF/UDTF class definition. | ||
| */ | ||
| class InvalidUDFClassException private[sql](message: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need private[sql] as it's already in a private package.
|
thanks, merging to master! |
|
Sorry I misread the test report. The last commit actually failed to compile. Since the last commit is very minor and the second last commit passed all tests, I'm sending a followup PR to fix compilation instead of reverting it. #29955 |
Fix a mistake when merging #29054 Closes #29955 from cloud-fan/hot-fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Yea, I miss this too... |
What changes were proposed in this pull request?
When we create a UDAF function use class extended
UserDefinedAggregeteFunction, when we call the function, in support hive mode, in HiveSessionCatalog, it will call super.makeFunctionExpression,but it will catch error such as the function need 2 parameter and we only give 1, throw exception only show
This is confused for develop , we should show error thrown by super method too,
For this pr's UT :
Before change, throw Exception like
After this pr, throw exception
Why are the changes needed?
Show more detail error message when define UDAF
Does this PR introduce any user-facing change?
People will see more detail error message when use spark sql's UDAF in hive support Mode
How was this patch tested?
Added UT