[BUG]when apply the udf, appears null pointer error #2

ranmx · 2017-09-28T11:58:38Z

Hello,

I tried to apply these UDFs in spark with hive support. Here is the code:

// register UDF
spark.sql("create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince'");
        	
// get file
Dataset<Row> rawdata = spark.read().csv("./src/main/resources/starM.csv");

// use UDF
rawdata.createOrReplaceTempView("starM");
Dataset<Row> udfModified = spark.sql("SELECT *, id_card_province(_c13) FROM starM");
udfModified.show();

and I got error:

org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(org.apache.hadoop.io.Text)  on object cc.shanruifeng.functions.card.UDFChinaIdCardProvince@5c622859 of class cc.shanruifeng.functions.card.UDFChinaIdCardProvince with arguments {652423184510291234:org.apache.hadoop.io.Text} of size 1
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:981)
	at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:91)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_6$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:235)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:957)
	... 18 more
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.io.Text.encode(Text.java:450)
	at org.apache.hadoop.io.Text.set(Text.java:198)
	at cc.shanruifeng.functions.card.UDFChinaIdCardProvince.evaluate(UDFChinaIdCardProvince.java:23)
	... 23 more

Could you please give me some advice on this problem? The column is not null anyway.

The text was updated successfully, but these errors were encountered:

ranmx · 2017-09-28T13:01:12Z

I find out the problem.
Is because that org.apache.hadoop.io.Text is unable to parse 'null' when the UDF returns one.

aaronshan · 2017-11-19T05:38:28Z

@ranmx thanks for your solution. I will close this issue.

fix some uncheck null issue add 2017,2018 holiday info

ranmx changed the title ~~when apply the udf, appears null pointer error~~ [BUG]when apply the udf, appears null pointer error Sep 28, 2017

aaronshan closed this as completed Nov 19, 2017

aaronshan self-assigned this Nov 19, 2017

aaronshan added a commit that referenced this issue Nov 19, 2017

fix bug #2

050d440

fix some uncheck null issue add 2017,2018 holiday info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]when apply the udf, appears null pointer error #2

[BUG]when apply the udf, appears null pointer error #2

ranmx commented Sep 28, 2017

ranmx commented Sep 28, 2017

aaronshan commented Nov 19, 2017

[BUG]when apply the udf, appears null pointer error #2

[BUG]when apply the udf, appears null pointer error #2

Comments

ranmx commented Sep 28, 2017

ranmx commented Sep 28, 2017

aaronshan commented Nov 19, 2017