ZEPPELIN-262 Use spark-submit to run spark interpreter process #270

Leemoonsoo · 2015-09-02T04:20:21Z

https://issues.apache.org/jira/browse/ZEPPELIN-262

This patch make zeppelin uses spark-submit to run spark interpreter process, when SPARK_HOME is defined. This will potentially solve all the configuration problems related to spark interpreter.

How to use?

Define SPARK_HOME env variable in conf/zeppelin-env.sh
Then it'll use your SPARK_HOME/bin/spark-submit, so you will not need any additional configuration :-)

Backward compatibility

If You have not defined your SPARK_HOME, you still able to run spark interpreter in old (current) way.
However it is not encouraged anymore.

…defined

…rk/dep directory

Leemoonsoo · 2015-09-02T06:19:49Z

Ready to merge. Please review the changes

felixcheung · 2015-09-02T07:03:10Z

conf/zeppelin-env.sh.template

Is there a reason the last half is taken out?

brought back those lines.

bzz · 2015-09-03T08:23:49Z

This is awesome improvement, thank you @Leemoonsoo
Looks great to me.

Leemoonsoo · 2015-09-04T02:33:25Z

I have pushed more commits, that handles pyspark. Please review them, too.

vinayshukla · 2015-09-04T04:11:50Z

bin/interpreter.sh

CDH? Is there hadoop distribution specific path?

It's part of heuristic to search and add hadoop jar files

randerzander · 2015-09-04T19:14:18Z

@Leemoonsoo how does z.load work with spark-submit? Seems those dependency jars should be added automatically to spark-submit's --jars argument.

Leemoonsoo · 2015-09-05T02:03:02Z

@randerzander dependency jars downloaded from z.load() is being loaded after SparkContext is created by calling sc.addJar(). So i think it'll not be affected by this change.

felixcheung · 2015-09-05T03:53:04Z

looks good!

Leemoonsoo · 2015-09-07T01:03:22Z

Merging, if there're no more discussions.

https://issues.apache.org/jira/browse/ZEPPELIN-262 This patch make zeppelin uses spark-submit to run spark interpreter process, when SPARK_HOME is defined. This will potentially solve all the configuration problems related to spark interpreter. #### How to use? Define SPARK_HOME env variable in conf/zeppelin-env.sh Then it'll use your SPARK_HOME/bin/spark-submit, so you will not need any additional configuration :-) #### Backward compatibility If You have not defined your SPARK_HOME, you still able to run spark interpreter in old (current) way. However it is not encouraged anymore. Author: Lee moon soo <[email protected]> Closes apache#270 from Leemoonsoo/spark_submit and squashes the following commits: 4eb0848 [Lee moon soo] export and check SPARK_SUBMIT a8a3440 [Lee moon soo] handle spark.files correctly for pyspark when spark-submit is used d4acd1b [Lee moon soo] Add PYTHONPATH c9418c6 [Lee moon soo] Bring back some entries with more commments cac2bb8 [Lee moon soo] Take care classpath of SparkIMain 5d3154e [Lee moon soo] Remove clean. otherwise mvn clean package will remove interpreter/spark/dep directory 2d27e9c [Lee moon soo] use spark-submit to run spark interpreter process when SPARK_HOME is defined (cherry picked from commit b4b4f55) Signed-off-by: Lee moon soo <[email protected]>

smusevic · 2016-03-01T13:39:30Z

Hello,
I'm testing out zeppelin-0.5.6-incubating-bin-all.tgz.

I might be wrong but it seems to me that this change causes:

SPARK_CLASSPATH was detected (set to ':/etc/hbase/conf').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

16/03/01 08:11:50 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to ':/etc/hbase/conf' as a work-around.
16/03/01 08:11:50 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.
        at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:473)
        at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:471)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:471)
        at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:459)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:459)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:391)
        at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
        at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
        at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
        at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
        at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
        at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

when conf/zeppelin-env.sh contains export SPARK_HOME=....
Removing the following text from added line 138 in bin/interpreter.sh:

--driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}"

the issue is resolved, as suggested by this email but something else happens:

| z
<console>:22: error: not found: value z
              z
              ^

which sadly blocks me from using z.load("path/to/jar") which is what I really need to do.
Please note that I do not have access to change any of the files inside SPARK_HOME, including any conf files residing therein.

Is there a workaround for this? Am I doing something wrong?
Thanks in advance!
S.

Leemoonsoo · 2016-03-01T19:26:46Z

@smusevic You look like have export SPARK_CLASSPATH=/etc/hbase/conf in conf/zeppelin-env.sh
Could you try export ZEPPELIN_CLASSPATH=/etc/hbase/conf instead?

smusevic · 2016-03-02T09:52:47Z

@Leemoonsoo thanks for your reply. I most definitely do not have export SPARK_CLASSPATH=/etc/hbase/conf in my conf/zeppelin-env.sh, double checked just now.
However, is anyone aware of SPARK_CLASSPATH getting initialized during spark-submit?
SPARK_CLASSPATH does get initialized in the bit/interpreter.sh, but investigation has revealed that the line in question does not execute if SPARK_HOME is set in conf/zeppelin-env.sh.
Anyway, my question now would be: if SPARK_CLASSPATH is not set when spark-submit is executed then Zeppelin using a SPARK_HOME set to some value should work?
Thank you in advance!
Regards,
S.

Leemoonsoo · 2016-03-02T17:09:46Z

@smusevic
Right, that should work. Set SPARK_HOME is preferred way to configure Zeppelin with Spark.

smusevic · 2016-03-03T08:31:16Z

Thanks, it turned out that SPARK_CLASSPATH was set in one of the shell files, bag practice... It works fine now, thanks!

weipuz · 2016-03-09T00:36:05Z

bin/interpreter.sh

+    export SPARK_SUBMIT="${SPARK_HOME}/bin/spark-submit"
+    SPARK_APP_JAR="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-spark*.jar)"
+    # This will evantually passes SPARK_APP_JAR to classpath of SparkIMain
+    ZEPPELIN_CLASSPATH=${SPARK_APP_JAR}


Hello, when I set SPARK_HOME to my external Spark, I found the zeppelin-interpreter-sparkxxx.log file are gone. I digged further and found if I change this line 79 in interpreter.sh ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} to ZEPPELIN_CLASSPATH+=${SPARK_APP_JAR} I get all the Spark interpreter log back. Is this a bug in the code or I miss understood something?
Regards,
Weipu

@weipuz Thanks for digging it !
I noticed that issue too. ( I set SPARK_HOME and can not get spark log file. )
I also changed ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} to ZEPPELIN_CLASSPATH+=${SPARK_APP_JAR} as you said. Then finally I can get my zeppelin-interpreter-spark-***.log file. As long as it's not an intended implementation, I think we need to fix this.

I think it need to be fixed 👍

@weipuz @Leemoonsoo I pushed a patch for this issue with HOT FIX tag at #769 : )
Thanks again @weipuz for reporting this.

Changing mailinglist address to apache one

Leemoonsoo added 3 commits September 1, 2015 19:45

use spark-submit to run spark interpreter process when SPARK_HOME is …

2d27e9c

…defined

Remove clean. otherwise mvn clean package will remove interpreter/spa…

5d3154e

…rk/dep directory

Take care classpath of SparkIMain

cac2bb8

felixcheung reviewed Sep 2, 2015
View reviewed changes

Bring back some entries with more commments

c9418c6

Leemoonsoo added 2 commits September 3, 2015 08:59

Add PYTHONPATH

d4acd1b

handle spark.files correctly for pyspark when spark-submit is used

a8a3440

vinayshukla reviewed Sep 4, 2015
View reviewed changes

export and check SPARK_SUBMIT

4eb0848

asfgit closed this in b4b4f55 Sep 8, 2015

weipuz reviewed Mar 9, 2016
View reviewed changes

lelou6666 pushed a commit to lelou6666/incubator-zeppelin that referenced this pull request Mar 25, 2016

Merge pull request apache#270 from NFLabs/change_mailinglist_addr

da64fa0

Changing mailinglist address to apache one

ZEPPELIN-262 Use spark-submit to run spark interpreter process #270

ZEPPELIN-262 Use spark-submit to run spark interpreter process #270

Uh oh!

Conversation

Leemoonsoo commented Sep 2, 2015

How to use?

Backward compatibility

Uh oh!

Leemoonsoo commented Sep 2, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bzz commented Sep 3, 2015

Uh oh!

Leemoonsoo commented Sep 4, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

randerzander commented Sep 4, 2015

Uh oh!

Leemoonsoo commented Sep 5, 2015

Uh oh!

felixcheung commented Sep 5, 2015

Uh oh!

Leemoonsoo commented Sep 7, 2015

Uh oh!

smusevic commented Mar 1, 2016

Uh oh!

Leemoonsoo commented Mar 1, 2016

Uh oh!

smusevic commented Mar 2, 2016

Uh oh!

Leemoonsoo commented Mar 2, 2016

Uh oh!

smusevic commented Mar 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants