Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jul 9, 2020

What changes were proposed in this pull request?

When we create a UDAF function use class extended UserDefinedAggregeteFunction,  when we call the function,  in support hive mode, in HiveSessionCatalog, it will call super.makeFunctionExpression, 

but it will catch error  such as the function need 2 parameter and we only give 1, throw exception only show 

No handler for UDF/UDAF/UDTF xxxxxxxx

This is confused for develop , we should show error thrown by super method too,

For this pr's UT :
Before change, throw Exception like

No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum'; line 1 pos 7

After this pr, throw exception

Spark UDAF Error: Invalid number of arguments for function longProductSum. Expected: 2; Found: 1;
Hive UDF/UDAF/UDTF Error: No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum'; line 1 pos 7

Why are the changes needed?

Show more detail error message when define UDAF

Does this PR introduce any user-facing change?

People will see more detail error message when use spark sql's UDAF in hive support Mode

How was this patch tested?

Added UT

@AngersZhuuuu
Copy link
Contributor Author

cc @dongjoon-hyun @maropu

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125474 has started for PR 29054 at commit 88bf5d7.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

import org.apache.spark.unsafe.UnsafeAlignedOffset


class ScalaAggregateFunction(schema: StructType) extends UserDefinedAggregateFunction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu why do we need to move classes? Let's separate refactoring and the fixes when possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu why do we need to move classes? Let's separate refactoring and the fixes when possible.

Make a mistake, change back, thanks

} else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
Try(super.makeFunctionExpression(name, clazz, input)) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu Seems we don't need to change getOrElse to match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu Seems we don't need to change getOrElse to match.

Confused, with getOrElse how can I get the exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, how about using try instead of Try?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try is the way as in the guideline. But in general I would like to avoid changing the indentation for unrelated codes and make it difficult to see what's the real diff. Such pattern makes it more difficult to backport and revert. Let's avoid this @AngersZhuuuu next time. The actual diff seems just adding more message in the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try is the way as in the guideline. But in general I would like to avoid changing the indentation for unrelated codes and make it difficult to see what's the real diff. Such pattern makes it more difficult to backport and revert. Let's avoid this @AngersZhuuuu next time. The actual diff seems just adding more message in the exception.

Since I always backport pr, I know the point you mentioned, I will notice this more carefully.
Yea, the actual diff is to show more message but since we need to know the exception in return Failure(exception), so we can't use getOrElse()

val functionClass = "org.apache.spark.sql.hive.execution.LongProductSum"
withUserDefinedFunction(functionName -> true) {
sql(s"CREATE TEMPORARY FUNCTION $functionName AS '$functionClass'")
val e1 = intercept[AnalysisException] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: e1 -> e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: e1 -> e

Done

@maropu
Copy link
Member

maropu commented Jul 10, 2020

Please describe exception messages before/after this PR in the description.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125532 has finished for PR 29054 at commit 88bf5d7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125550 has finished for PR 29054 at commit 5dd3169.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@AngersZhuuuu
Copy link
Contributor Author

Please describe exception messages before/after this PR in the description.

Done

@maropu
Copy link
Member

maropu commented Jul 10, 2020

  1. Make Expression failed in SessionCatalog for function name = longProductSum: Invalid number of arguments for function longProductSum. Expected: 2; Found: 1;
  2. No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum'; line 1 pos 7

We need the number 1 and 2? How about reformatting it like this?

No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum' because; Invalid number of arguments for function longProductSum

@AngersZhuuuu
Copy link
Contributor Author

We need the number 1 and 2? How about reformatting it like this?

No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum' because; Invalid number of arguments for function longProductSum

For me , I prefer more clear information let user know it's wrong form super.makeExpressionFunction. and

No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum

don't have relation about

Invalid number of arguments for function longProductSum

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125595 has finished for PR 29054 at commit 4e6b506.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("Hive mode use spark udaf should show error") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a JIRA prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a JIRA prefix.

Done

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125592 has finished for PR 29054 at commit 5dd3169.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125601 has finished for PR 29054 at commit 91ceea0.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125620 has finished for PR 29054 at commit 8c3faed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 10, 2020

For me , I prefer more clear information let user know it's wrong form super.makeExpressionFunction. and

But, the relationship of 1. and 2. looks ambiguous to me. Does the message mean that there are two reasons for the failure? Could you brush up the message more?

@AngersZhuuuu
Copy link
Contributor Author

But, the relationship of 1. and 2. looks ambiguous to me. Does the message mean that there are two reasons for the failure? Could you brush up the message more?

Yea, when we use Spark SQL in support hive mode, the real SessionCatalog is HiveSessionCatalog, when we create and use UDAF, it call makeExpressionFunction. In the code for UDAf, you can see that it will call super.makeExpressionFunction first, so it will check if the UDAF is follow spark's UDAF (class implement UserDefinedAggregateFunction), if failed, it will try hive's UDAF solution.

So in my test case, it's follow spark's UDAF definition, but makeExpressionFunction failed since when I use it, the argument number is wrong. In origin code, it will only show

No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.hive.execution.LongProductSum'

But what we real need know is

Invalid number of arguments for function longProductSum

These two error is from different level 1. Spark UDAF 2. Hive UDAF.
1. 2. seems really ambiguous, how about For Spark UDAF: & For Hive UDAF to make it more clear

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125681 has finished for PR 29054 at commit b813656.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

cc @HyukjinKwon @maropu Any update?

// Current thread context classloader may not be the one loaded the class. Need to switch
// context classloader to initialize instance properly.
Utils.withContextClassLoader(clazz.getClassLoader) {
Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu, can you get rid of this unrelated diffs?

Copy link
Contributor Author

@AngersZhuuuu AngersZhuuuu Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu, can you get rid of this unrelated diffs?

How about current change, it won't change indentation. cc @maropu

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33847/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129217 has finished for PR 29054 at commit 2db41fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

throw new AnalysisException(s"Invalid number of arguments for function $name. " +
s"Expected: ${e.inputTypes.size}; Found: ${input.size}")
throw new AnalysisException(s"Invalid number of arguments for " +
s"function $name. Expected: ${e.inputTypes.size}; Found: ${input.size}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

Done

// If `super.makeFunctionExpression` throw `InvalidUDFClassException`, we construct
// Hive UDF/UDAF/UDTF with function definition.
makeHiveFunctionExpression(name, clazz, input)
case e: AnalysisException =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: case e => throw e

The only exception is InvalidUDFClassException, where we need to try with Hive UDF class. For other exceptions, just re-throw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: case e => throw e

The only exception is InvalidUDFClassException, where we need to try with Hive UDF class. For other exceptions, just re-throw.

Done

parser,
functionResourceLoader) {

def makeHiveFunctionExpression(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

Done

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129240 has finished for PR 29054 at commit 36229e3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33856/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33857/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33856/

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33857/

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129231 has finished for PR 29054 at commit 3d9b6e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class InvalidUDFClassException protected[sql](message: String)

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129244 has finished for PR 29054 at commit 36229e3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33861/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33861/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129239 has finished for PR 29054 at commit 42064d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Thrown when a query failed for invalid function class, usually because a SQL
* function's class does not follow the rules of the UDF/UDAF/UDTF class definition.
*/
class InvalidUDFClassException private[sql](message: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need private[sql] as it's already in a private package.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan
Copy link
Contributor

Sorry I misread the test report. The last commit actually failed to compile.

Since the last commit is very minor and the second last commit passed all tests, I'm sending a followup PR to fix compilation instead of reverting it. #29955

cloud-fan added a commit that referenced this pull request Oct 6, 2020
Fix a mistake when merging #29054

Closes #29955 from cloud-fan/hot-fix.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@AngersZhuuuu
Copy link
Contributor Author

Sorry I misread the test report. The last commit actually failed to compile.

Since the last commit is very minor and the second last commit passed all tests, I'm sending a followup PR to fix compilation instead of reverting it. #29955

Yea, I miss this too...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants