-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-262 Use spark-submit to run spark interpreter process #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
2d27e9c
use spark-submit to run spark interpreter process when SPARK_HOME is …
Leemoonsoo 5d3154e
Remove clean. otherwise mvn clean package will remove interpreter/spa…
Leemoonsoo cac2bb8
Take care classpath of SparkIMain
Leemoonsoo c9418c6
Bring back some entries with more commments
Leemoonsoo d4acd1b
Add PYTHONPATH
Leemoonsoo a8a3440
handle spark.files correctly for pyspark when spark-submit is used
Leemoonsoo 4eb0848
export and check SPARK_SUBMIT
Leemoonsoo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,77 +72,57 @@ fi | |
|
|
||
| # set spark related env variables | ||
| if [[ "${INTERPRETER_ID}" == "spark" ]]; then | ||
| # add Hadoop jars into classpath | ||
| if [[ -n "${HADOOP_HOME}" ]]; then | ||
| # Apache | ||
| addEachJarInDir "${HADOOP_HOME}/share" | ||
|
|
||
| # CDH | ||
| addJarInDir "${HADOOP_HOME}" | ||
| addJarInDir "${HADOOP_HOME}/lib" | ||
| fi | ||
|
|
||
| # autodetect HADOOP_CONF_HOME by heuristic | ||
| if [[ -n "${HADOOP_HOME}" ]] && [[ -z "${HADOOP_CONF_DIR}" ]]; then | ||
| if [[ -d "${HADOOP_HOME}/etc/hadoop" ]]; then | ||
| export HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop" | ||
| elif [[ -d "/etc/hadoop/conf" ]]; then | ||
| export HADOOP_CONF_DIR="/etc/hadoop/conf" | ||
| fi | ||
| fi | ||
|
|
||
| if [[ -n "${HADOOP_CONF_DIR}" ]] && [[ -d "${HADOOP_CONF_DIR}" ]]; then | ||
| ZEPPELIN_CLASSPATH+=":${HADOOP_CONF_DIR}" | ||
| fi | ||
|
|
||
| # add Spark jars into classpath | ||
| if [[ -n "${SPARK_HOME}" ]]; then | ||
| addJarInDir "${SPARK_HOME}/lib" | ||
| PYSPARKPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" | ||
| SPARK_SUBMIT="${SPARK_HOME}/bin/spark-submit" | ||
| SPARK_APP_JAR="$(ls ${ZEPPELIN_HOME}/interpreter/spark/zeppelin-spark*.jar)" | ||
| # This will evantually passes SPARK_APP_JAR to classpath of SparkIMain | ||
| ZEPPELIN_CLASSPATH=${SPARK_APP_JAR} | ||
| else | ||
| # add Hadoop jars into classpath | ||
| if [[ -n "${HADOOP_HOME}" ]]; then | ||
| # Apache | ||
| addEachJarInDir "${HADOOP_HOME}/share" | ||
|
|
||
| # CDH | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CDH? Is there hadoop distribution specific path?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's part of heuristic to search and add hadoop jar files |
||
| addJarInDir "${HADOOP_HOME}" | ||
| addJarInDir "${HADOOP_HOME}/lib" | ||
| fi | ||
|
|
||
| addJarInDir "${INTERPRETER_DIR}/dep" | ||
| PYSPARKPATH="${ZEPPELIN_HOME}/interpreter/spark/pyspark/pyspark.zip:${ZEPPELIN_HOME}/interpreter/spark/pyspark/py4j-0.8.2.1-src.zip" | ||
| fi | ||
|
|
||
| # autodetect SPARK_CONF_DIR | ||
| if [[ -n "${SPARK_HOME}" ]] && [[ -z "${SPARK_CONF_DIR}" ]]; then | ||
| if [[ -d "${SPARK_HOME}/conf" ]]; then | ||
| SPARK_CONF_DIR="${SPARK_HOME}/conf" | ||
| if [[ -z "${PYTHONPATH}" ]]; then | ||
| export PYTHONPATH="${PYSPARKPATH}" | ||
| else | ||
| export PYTHONPATH="${PYTHONPATH}:${PYSPARKPATH}" | ||
| fi | ||
| unset PYSPARKPATH | ||
|
|
||
| # autodetect HADOOP_CONF_HOME by heuristic | ||
| if [[ -n "${HADOOP_HOME}" ]] && [[ -z "${HADOOP_CONF_DIR}" ]]; then | ||
| if [[ -d "${HADOOP_HOME}/etc/hadoop" ]]; then | ||
| export HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop" | ||
| elif [[ -d "/etc/hadoop/conf" ]]; then | ||
| export HADOOP_CONF_DIR="/etc/hadoop/conf" | ||
| fi | ||
| fi | ||
| fi | ||
|
|
||
| # read spark-*.conf if exists | ||
| if [[ -d "${SPARK_CONF_DIR}" ]]; then | ||
| ls ${SPARK_CONF_DIR}/spark-*.conf > /dev/null 2>&1 | ||
| if [[ "$?" -eq 0 ]]; then | ||
| for file in ${SPARK_CONF_DIR}/spark-*.conf; do | ||
| while read -r line; do | ||
| echo "${line}" | grep -e "^spark[.]" > /dev/null | ||
| if [ "$?" -ne 0 ]; then | ||
| # skip the line not started with 'spark.' | ||
| continue; | ||
| fi | ||
| SPARK_CONF_KEY=`echo "${line}" | sed -e 's/\(^spark[^ ]*\)[ \t]*\(.*\)/\1/g'` | ||
| SPARK_CONF_VALUE=`echo "${line}" | sed -e 's/\(^spark[^ ]*\)[ \t]*\(.*\)/\2/g'` | ||
| export ZEPPELIN_JAVA_OPTS+=" -D${SPARK_CONF_KEY}=\"${SPARK_CONF_VALUE}\"" | ||
| done < "${file}" | ||
| done | ||
| if [[ -n "${HADOOP_CONF_DIR}" ]] && [[ -d "${HADOOP_CONF_DIR}" ]]; then | ||
| ZEPPELIN_CLASSPATH+=":${HADOOP_CONF_DIR}" | ||
| fi | ||
| fi | ||
|
|
||
| if [[ -z "${PYTHONPATH}" ]]; then | ||
| export PYTHONPATH="${PYSPARKPATH}" | ||
| else | ||
| export PYTHONPATH="${PYTHONPATH}:${PYSPARKPATH}" | ||
| export SPARK_CLASSPATH+=":${ZEPPELIN_CLASSPATH}" | ||
| fi | ||
|
|
||
| unset PYSPARKPATH | ||
| fi | ||
|
|
||
| export SPARK_CLASSPATH+=":${ZEPPELIN_CLASSPATH}" | ||
| CLASSPATH+=":${ZEPPELIN_CLASSPATH}" | ||
|
|
||
| ${ZEPPELIN_RUNNER} ${JAVA_INTP_OPTS} -cp ${CLASSPATH} ${ZEPPELIN_SERVER} ${PORT} & | ||
| if [[ -n "${SPARK_SUBMIT}" ]]; then | ||
| ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} & | ||
| else | ||
| ${ZEPPELIN_RUNNER} ${JAVA_INTP_OPTS} -cp ${CLASSPATH} ${ZEPPELIN_SERVER} ${PORT} & | ||
| fi | ||
|
|
||
| pid=$! | ||
| if [[ -z "${pid}" ]]; then | ||
| return 1; | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, when I set SPARK_HOME to my external Spark, I found the zeppelin-interpreter-sparkxxx.log file are gone. I digged further and found if I change this line 79 in interpreter.sh
ZEPPELIN_CLASSPATH=${SPARK_APP_JAR}toZEPPELIN_CLASSPATH+=${SPARK_APP_JAR}I get all the Spark interpreter log back. Is this a bug in the code or I miss understood something?Regards,
Weipu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weipuz Thanks for digging it !
I noticed that issue too. ( I set
SPARK_HOMEand can not get spark log file. )I also changed
ZEPPELIN_CLASSPATH=${SPARK_APP_JAR}toZEPPELIN_CLASSPATH+=${SPARK_APP_JAR}as you said. Then finally I can get myzeppelin-interpreter-spark-***.logfile. As long as it's not an intended implementation, I think we need to fix this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it need to be fixed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weipuz @Leemoonsoo I pushed a patch for this issue with HOT FIX tag at #769 : )
Thanks again @weipuz for reporting this.