Skip to content

Conversation

@jeanlyn
Copy link
Contributor

@jeanlyn jeanlyn commented Mar 18, 2015

When we use spark cli to add jar dynamic,we will get the java.lang.ClassNotFoundException when we use the class of jar to create udf.For example:

spark-sql> add jar /home/jeanlyn/hello.jar;
spark-sql>create temporary function hello as 'hello';
spark-sql>select hello(name) from person;
Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassNotFoundException: hello

we can use the spark physical plan to fix this problem

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

ok to test

@marmbrus
Copy link
Contributor

Would it be possible to add a test case in the CLI suite? /cc @liancheng

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28764 has started for PR 5079 at commit ca95849.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28764 has finished for PR 5079 at commit ca95849.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28764/
Test PASSed.

@liancheng
Copy link
Contributor

@marmbrus We can merge this PR first, I'm trying to write a test case for this PR.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28792 has started for PR 5079 at commit ca78d72.

  • This patch merges cleanly.

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 18, 2015

Updated, @liancheng @marmbrus I had tried to add a test for this patch,could you take a look for the test?Thanks!

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28792 has finished for PR 5079 at commit ca78d72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28792/
Test PASSed.

@liancheng
Copy link
Contributor

Hm, the problem is that we should also add the source code of hello.jar (probably after renaming it to TestUdf.jar). But I'm not quite sure where to put it for now. That's why I said we can merge the fix first, and later add the test in a proper way.

In Hive, there is separate Maven module itest/test-serde, which results in a similar TestSerDe.jar solely for testing purposes. I wonder is there a simpler way to handle this. Adding a module seems to be an overkill.

/cc @marmbrus @yhuai @pwendell

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 18, 2015

Thanks @liancheng for explain.You are right,it need consider more about it.So,should i remove the test?

@liancheng
Copy link
Contributor

Your test case itself makes sense. Let's wait for more comments first :)

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 18, 2015

Ok.

@yhuai
Copy link
Contributor

yhuai commented Mar 18, 2015

Will it be better if we put the jar in sql/hive-thriftserver/src/test/resources/jar (not in data)? Also, what is in that jar? A hello world function?

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28849 has started for PR 5079 at commit 2de3945.

  • This patch merges cleanly.

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

@yhuai ,There is a simple function

    public String evaluate(String str) {
        try {
            return "hello " + str;
        } catch (Exception e) {
            return null;
        }
    }

@chenghao-intel
Copy link
Contributor

Same issue with #4586 ? Actually we are quite headache with the class loading/unloading problem. @adrian-wang can you review this also?

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

@chenghao-intel I am not clear what problem #4586 try to fix.If #4586 try to fix the problem as I mentioned.I think we can reuse the SparkContext.addJar is enough to fix the class loading/unloading problem.Right?

@adrian-wang
Copy link
Contributor

Actually I tried this method before and it would work in unit test but not ok with shell.
I'll double check this in the afternoon.

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28849 has finished for PR 5079 at commit 2de3945.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28849/
Test PASSed.

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

@adrian-wang You mean not work in spark-shell ?

@adrian-wang
Copy link
Contributor

@jeanlyn have you tried clean assembly and run it in spark-sql?

@adrian-wang
Copy link
Contributor

I double checked your code with latest master, the problem still persists.
Considered this is just a dup of #4586 maybe we should close this one.

Thanks!

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

@adrian-wang ,I had tested in spark-sql ,and get result correctly with my test case. Can you provide your test case?By the way,when i debug this issue i found in the thrifter-server mode,it also reuse the SparkContext.addJar.

@adrian-wang
Copy link
Contributor

@jeanlyn you can just try execute mapjoin_addjar.q in spark-sql.
Note you need to find a valid jar position.

@chenghao-intel
Copy link
Contributor

The tests in 2 PRs are different, this PR is about the UDF jar, but #4586 is the SerDe jar. They may be loaded by difference class loader.

@jeanlyn can you paste the full code for the UDF function?

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

@chenghao-intel my full code is

import org.apache.hadoop.hive.ql.exec.UDF;

public class hello extends UDF {
    public String evaluate(String str) {
        try {
            return "hello " + str;
        } catch (Exception e) {
            return null;
        }
    }
}

@adrian-wang ,I also test the mapjoin_addjar.q in spark-sql.
I got the exception when CREATE TABLE

15/03/19 14:41:36 ERROR DDLTask: java.lang.NoSuchFieldError: CHAR
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:310)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:277)
    at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
    at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)

But it seems that not the load jar problem.Because when i not run the

add jar ${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar;

I got the follow exception when i create table

15/03/19 14:54:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe
    at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3423)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3553)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:252)

@adrian-wang
Copy link
Contributor

@jeanlyn we are not getting same thing. Even our .q file differs. I don't have CHAR in my .q file.

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 19, 2015

I also don't have CHAR in mapjoin_addjar.q. I only find one mapjoin_addjar.q,and the path of my file is
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/mapjoin_addjar.q

set hive.auto.convert.join=true;
set hive.auto.convert.join.use.nonstaged=false;

add jar ${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar;

CREATE TABLE t1 (a string, b string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
;
LOAD DATA LOCAL INPATH "../../data/files/sample.json" INTO TABLE t1;
select * from src join t1 on src.key =t1.a;
drop table t1;
set hive.auto.convert.join=false;

May be we can discuss this offline?

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Mar 23, 2015

After communicated with @adrian-wang offline. I realized this PR still leave some class loader problem.So i close this one.

@jeanlyn jeanlyn closed this Mar 23, 2015
@jeanlyn jeanlyn deleted the SPARK-6392 branch July 3, 2015 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants