-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-262 Use spark-submit to run spark interpreter process #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ready to merge. Please review the changes |
conf/zeppelin-env.sh.template
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the last half is taken out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brought back those lines.
|
This is awesome improvement, thank you @Leemoonsoo |
|
I have pushed more commits, that handles pyspark. Please review them, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDH? Is there hadoop distribution specific path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of heuristic to search and add hadoop jar files
|
@Leemoonsoo how does z.load work with spark-submit? Seems those dependency jars should be added automatically to spark-submit's --jars argument. |
|
@randerzander dependency jars downloaded from z.load() is being loaded after SparkContext is created by calling sc.addJar(). So i think it'll not be affected by this change. |
|
looks good! |
|
Merging, if there're no more discussions. |
https://issues.apache.org/jira/browse/ZEPPELIN-262 This patch make zeppelin uses spark-submit to run spark interpreter process, when SPARK_HOME is defined. This will potentially solve all the configuration problems related to spark interpreter. #### How to use? Define SPARK_HOME env variable in conf/zeppelin-env.sh Then it'll use your SPARK_HOME/bin/spark-submit, so you will not need any additional configuration :-) #### Backward compatibility If You have not defined your SPARK_HOME, you still able to run spark interpreter in old (current) way. However it is not encouraged anymore. Author: Lee moon soo <[email protected]> Closes apache#270 from Leemoonsoo/spark_submit and squashes the following commits: 4eb0848 [Lee moon soo] export and check SPARK_SUBMIT a8a3440 [Lee moon soo] handle spark.files correctly for pyspark when spark-submit is used d4acd1b [Lee moon soo] Add PYTHONPATH c9418c6 [Lee moon soo] Bring back some entries with more commments cac2bb8 [Lee moon soo] Take care classpath of SparkIMain 5d3154e [Lee moon soo] Remove clean. otherwise mvn clean package will remove interpreter/spark/dep directory 2d27e9c [Lee moon soo] use spark-submit to run spark interpreter process when SPARK_HOME is defined (cherry picked from commit b4b4f55) Signed-off-by: Lee moon soo <[email protected]>
|
Hello, I might be wrong but it seems to me that this change causes: when the issue is resolved, as suggested by this email but something else happens: which sadly blocks me from using Is there a workaround for this? Am I doing something wrong? |
|
@smusevic You look like have |
|
@Leemoonsoo thanks for your reply. I most definitely do not have |
|
@smusevic |
|
Thanks, it turned out that |
| export SPARK_SUBMIT="${SPARK_HOME}/bin/spark-submit" | ||
| SPARK_APP_JAR="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-spark*.jar)" | ||
| # This will evantually passes SPARK_APP_JAR to classpath of SparkIMain | ||
| ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, when I set SPARK_HOME to my external Spark, I found the zeppelin-interpreter-sparkxxx.log file are gone. I digged further and found if I change this line 79 in interpreter.sh ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} to ZEPPELIN_CLASSPATH+=${SPARK_APP_JAR} I get all the Spark interpreter log back. Is this a bug in the code or I miss understood something?
Regards,
Weipu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weipuz Thanks for digging it !
I noticed that issue too. ( I set SPARK_HOME and can not get spark log file. )
I also changed ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} to ZEPPELIN_CLASSPATH+=${SPARK_APP_JAR} as you said. Then finally I can get my zeppelin-interpreter-spark-***.log file. As long as it's not an intended implementation, I think we need to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it need to be fixed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weipuz @Leemoonsoo I pushed a patch for this issue with HOT FIX tag at #769 : )
Thanks again @weipuz for reporting this.
Changing mailinglist address to apache one
https://issues.apache.org/jira/browse/ZEPPELIN-262
This patch make zeppelin uses spark-submit to run spark interpreter process, when SPARK_HOME is defined. This will potentially solve all the configuration problems related to spark interpreter.
How to use?
Define SPARK_HOME env variable in conf/zeppelin-env.sh
Then it'll use your SPARK_HOME/bin/spark-submit, so you will not need any additional configuration :-)
Backward compatibility
If You have not defined your SPARK_HOME, you still able to run spark interpreter in old (current) way.
However it is not encouraged anymore.