Skip to content

Conversation

@riverajo
Copy link

Simple doc fix for pyspark in yarn.

@felixcheung
Copy link
Member

It should build this automatically when SPARK_HOME is defined right?

@Leemoonsoo
Copy link
Member

I think PYTHONPATH supposed to build automatically when SPARK_HOME is defined by bin/interpreter.sh. But SPARK_YARN_USER_ENV is not taken care of.

@felixcheung
Copy link
Member

Hmm, should we use SparkConf instead then:

https://github.com/apache/spark/blob/69c9c177160e32a2fbc9b36ecc52156077fca6fc/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

    // Keep this for backwards compatibility but users should move to the config
    sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
      YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs)
    }

I believe all spark.* in the interpreter settings are passed to SparkConf.

https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java

Is this a bug:

      if (!key.startsWith("spark.") || !val.trim().isEmpty()) {
        logger.debug(String.format("SparkConf: key = [%s], value = [%s]", key, val));
        conf.set(key, val);
      }

Shouldn't it say if (key.startsWith("spark.")?

@riverajo
Copy link
Author

It should build this automatically when SPARK_HOME is defined right?

That seems right, I didn't need to set PYTHONPATH if SPARK_HOME is set correctly, but I do still need SPARK_YARN_USER_ENV.

However it seems as thought the PYTHONPATH gets created after the zeppelin-env.sh gets run, If I don't set both PYTHONPATH and SPARK_YARN_USER_ENV in that file, It complains that SPARK_YARN_USER_ENV is empty with an array out of bounds exception.

@Leemoonsoo
Copy link
Member

@felixcheung

Is this a bug:

if (!key.startsWith("spark.") || !val.trim().isEmpty()) {
        logger.debug(String.format("SparkConf: key = [%s], value = [%s]", key, val));
        conf.set(key, val);
      }

Shouldn't it say if (key.startsWith("spark.")?

I think it's not a bug. Designed to not pass empty value when the key starts with "spark."
This unittest might helpful to understand.
https://github.com/apache/incubator-zeppelin/blob/master/spark/src/test/java/org/apache/zeppelin/spark/SparkInterpreterTest.java#L168

@Leemoonsoo
Copy link
Member

Hmm, should we use SparkConf instead then:

https://github.com/apache/spark/blob/69c9c177160e32a2fbc9b36ecc52156077fca6fc/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

// Keep this for backwards compatibility but users should move to the config
    sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
      YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs)
    }

I believe all spark.* in the interpreter settings are passed to SparkConf.

How about put the same comment Keep this for backwards compatibility but users should move to the config in the document and proceed?

How about take care zeppelin-env.sh.template, too?

@felixcheung
Copy link
Member

@Leemoonsoo good idea.

@corneadoug
Copy link
Contributor

@riverajo @felixcheung @Leemoonsoo
Any changes to be done here?
Is this documentation change still needed?

@felixcheung
Copy link
Member

I'm not sure if we do but the py4j version changes for the last few Spark releases, so if we do, we would need a way to set the right version as per spark version

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants