[SPARK-1897] Respect spark.jars (and --jars) in spark-shell #849

andrewor14 · 2014-05-21T21:26:28Z

Spark shell currently overwrites spark.jars with ADD_JARS. In all modes except yarn-cluster, this means the --jar flag passed to bin/spark-shell is also discarded. However, in the docs, we explicitly tell the users to add the jars this way.

AmplabJenkins · 2014-05-21T21:27:58Z

Merged build triggered.

AmplabJenkins · 2014-05-21T21:28:03Z

Merged build started.

This causes spark-shell to attempt to add the working directory to HTTP server and fail.

AmplabJenkins · 2014-05-21T21:52:58Z

Merged build triggered.

AmplabJenkins · 2014-05-21T21:53:03Z

Merged build started.

AmplabJenkins · 2014-05-21T22:29:40Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-21T22:29:40Z

Merged build finished.

AmplabJenkins · 2014-05-21T22:29:41Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15129/

AmplabJenkins · 2014-05-21T22:29:41Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15127/

andrewor14 · 2014-05-21T22:30:41Z

Jenkins, test this please

AmplabJenkins · 2014-05-21T22:32:58Z

Merged build triggered.

AmplabJenkins · 2014-05-21T22:33:03Z

Merged build started.

tdas · 2014-05-21T22:37:48Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

This change ensures that spark.jars is not set to an empty string by SparkSubmit, if not jars have been specified. This is different from previous behavior where empty string was being passed on downstream, to YARN, etc. What are the repercussions of this?

@pwendell

AmplabJenkins · 2014-05-21T23:34:44Z

Merged build finished.

AmplabJenkins · 2014-05-21T23:34:44Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15131/

andrewor14 · 2014-05-21T23:56:38Z

Jenkins, test this please

AmplabJenkins · 2014-05-21T23:57:57Z

Merged build triggered.

AmplabJenkins · 2014-05-21T23:58:04Z

Merged build started.

AmplabJenkins · 2014-05-22T01:29:05Z

Merged build finished.

AmplabJenkins · 2014-05-22T01:29:06Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15134/

This is a little hacky, but minimizes the changes in SparkSubmit so that there are fewer things we need to reason about.

AmplabJenkins · 2014-05-22T02:07:57Z

Merged build triggered.

AmplabJenkins · 2014-05-22T02:08:05Z

Merged build started.

AmplabJenkins · 2014-05-22T02:12:57Z

Merged build triggered.

AmplabJenkins · 2014-05-22T02:13:05Z

Merged build started.

AmplabJenkins · 2014-05-22T02:50:19Z

Merged build finished.

AmplabJenkins · 2014-05-22T02:50:19Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15135/

AmplabJenkins · 2014-05-22T02:53:32Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-22T02:53:32Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15136/

tdas · 2014-05-23T03:25:06Z

Tested it in a standalone cluster, confirmed that --jars works with this change. And so does ADD_JARS. Merging this. Thanks @andrewor14

Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way. Author: Andrew Or <[email protected]> Closes #849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell (cherry picked from commit 8edbee7) Signed-off-by: Tathagata Das <[email protected]>

@mengxr

If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <[email protected]> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly

@mengxr

If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <[email protected]> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly (cherry picked from commit 5081a0a) Signed-off-by: Tathagata Das <[email protected]>

Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way. Author: Andrew Or <[email protected]> Closes apache#849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell

@mengxr

If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <[email protected]> Closes apache#853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after apache#849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly

* [CARMEL-4737] add lock for compact command * enable compact transparency for delta * support recover Partition lock * fix * fix style * Avoid conflict with convertToDelta * fix * fix for delta * double check leaffiles when get empty list * fix bug * fix ut * backup compact folder * should not check signature again when leaffiles are empty Co-authored-by: fenzhu <[email protected]>

Respect spark.jars and --jars in spark-shell

d8549f7

Do not set spark.jars to ""

c6da113

This causes spark-shell to attempt to add the working directory to HTTP server and fail.

andrewor14 changed the title ~~[SPARK-1897] Respect spark.jars and --jars in spark-shell~~ [SPARK-1897] Respect spark.jars (and --jars) in REPL May 21, 2014

andrewor14 changed the title ~~[SPARK-1897] Respect spark.jars (and --jars) in REPL~~ [SPARK-1897] Respect spark.jars (and --jars) in spark-shell May 21, 2014

tdas reviewed May 21, 2014
View reviewed changes

andrewor14 added 2 commits May 21, 2014 19:01

Handle spark.jars == "" in SparkILoop, not SparkSubmit

afc357c

This is a little hacky, but minimizes the changes in SparkSubmit so that there are fewer things we need to reason about.

',' -> "," (minor)

928a7e6

asfgit closed this in 8edbee7 May 23, 2014

andrewor14 mentioned this pull request May 23, 2014

[SPARK-1900 / 1918] PySpark on YARN is broken #853

Closed

andrewor14 added a commit to andrewor14/spark that referenced this pull request May 23, 2014

Fix spark-shell jar paths after apache#849 went in

6638a6b

andrewor14 deleted the shell-jars branch June 9, 2014 17:47

[SPARK-1897] Respect spark.jars (and --jars) in spark-shell #849

[SPARK-1897] Respect spark.jars (and --jars) in spark-shell #849

Uh oh!

Conversation

andrewor14 commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

andrewor14 commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

tdas May 21, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

andrewor14 commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 21, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

AmplabJenkins commented May 22, 2014

Uh oh!

tdas commented May 23, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants