Skip to content

Conversation

@marsishandsome
Copy link

When running "bin/computer-classpath.sh", the output will be:
:/spark/conf:/spark/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.5.0-cdh5.2.0.jar:/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/spark/lib_managed/jars/datanucleus-core-3.2.10.jar
Java will add the current working dir to the CLASSPATH, if the first ":" exists, which is not expected by spark users.
For example, if I call spark-shell in the folder /root. And there exists a "core-site.xml" under /root/. Spark will use this file as HADOOP CONF file, even if I have already set HADOOP_CONF_DIR=/etc/hadoop/conf.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, specifically this is taking care of the case where SPARK_SUBMIT_CLASSPATH is empty or not set? Otherwise I think it would be equivalent, but yeah, we should handle that case and not add empty elements.

However your change here also puts the conf directories first, rather than last, on the classpath. I don't know if that was intentional but I suppose it would be better to not change that here if the problem being solved is just an empty classpath element.

@srowen
Copy link
Member

srowen commented Mar 24, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29081 has started for PR 5156 at commit 35c25d4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29081 has finished for PR 5156 at commit 35c25d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29081/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29092 has started for PR 5156 at commit 3e859f9.

  • This patch merges cleanly.

@marsishandsome
Copy link
Author

@srowen I've pushed another commit and keep the original order of the classpath.
A help function "addClassPath()" is used to avoid code duplication.
I'm pleased to modify other codes to use "addClassPath", if you think it's ok.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29092 has finished for PR 5156 at commit 3e859f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29092/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 24, 2015

I like this. @nchammas for bash thoughts.
@vanzin does the new launcher have any issue of this form? It replaces the file being changed.
There are many places in this file that could use the new function? might be more consistent. Maybe call it appendToClasspath to be clearer about where it adds.

@nchammas
Copy link
Contributor

Bash changes LGTM. ⭐

@vanzin
Copy link
Contributor

vanzin commented Mar 24, 2015

The launcher lib in master shouldn't have this problem (see AbstractCommandBuilder::addToClassPath; classpath is treated as a list, not as a string.)

@marsishandsome
Copy link
Author

@srowen Please review.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29129 has started for PR 5156 at commit 0656eeb.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29136 has started for PR 5156 at commit 5ae214f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29129 has finished for PR 5156 at commit 0656eeb.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29129/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29136 has finished for PR 5156 at commit 5ae214f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29136/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 25, 2015

I like this, especially how much it reduces duplication. Obviously this can go into 1.3 at the latest. If there's little or no conflict, I think it can be merged back to 1.2 too. Let me leave it a short while longer for comments.

@sryza
Copy link
Contributor

sryza commented Mar 25, 2015

Just to confirm my understanding: this makes it so that Spark no longer puts the current working dir on the classpath? Or it changes Spark to now include the current working dir?

@srowen
Copy link
Member

srowen commented Mar 25, 2015

This takes the current working dir off of the classpath, since it appears to be there only accidentally, because weirdly "java -cp :foo.jar" (note the leading empty entry) does this.

asfgit pushed a commit that referenced this pull request Mar 26, 2015
When running "bin/computer-classpath.sh", the output will be:
:/spark/conf:/spark/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.5.0-cdh5.2.0.jar:/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/spark/lib_managed/jars/datanucleus-core-3.2.10.jar
Java will add the current working dir to the CLASSPATH, if the first ":" exists, which is not expected by spark users.
For example, if I call spark-shell in the folder /root. And there exists a "core-site.xml" under /root/. Spark will use this file as HADOOP CONF file, even if I have already set HADOOP_CONF_DIR=/etc/hadoop/conf.

Author: guliangliang <[email protected]>

Closes #5156 from marsishandsome/Spark6491 and squashes the following commits:

5ae214f [guliangliang] use appendToClasspath to change CLASSPATH
b21f3b2 [guliangliang] keep the classpath order
5d1f870 [guliangliang] [SPARK-6491] Spark will put the current working dir to the CLASSPATH
@srowen
Copy link
Member

srowen commented Mar 27, 2015

@marsishandsome could you close this PR? since it wasn't opened vs master, the automatic process does not close it. It was merged into branch-1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants