Skip to content

Conversation

@andrewor14
Copy link
Contributor

Spark shell currently overwrites spark.jars with ADD_JARS. In all modes except yarn-cluster, this means the --jar flag passed to bin/spark-shell is also discarded. However, in the docs, we explicitly tell the users to add the jars this way.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

This causes spark-shell to attempt to add the working directory to
HTTP server and fail.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@andrewor14 andrewor14 changed the title [SPARK-1897] Respect spark.jars and --jars in spark-shell [SPARK-1897] Respect spark.jars (and --jars) in REPL May 21, 2014
@andrewor14 andrewor14 changed the title [SPARK-1897] Respect spark.jars (and --jars) in REPL [SPARK-1897] Respect spark.jars (and --jars) in spark-shell May 21, 2014
@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15129/

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15127/

@andrewor14
Copy link
Contributor Author

Jenkins, test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change ensures that spark.jars is not set to an empty string by SparkSubmit, if not jars have been specified. This is different from previous behavior where empty string was being passed on downstream, to YARN, etc. What are the repercussions of this?

@pwendell

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15131/

@andrewor14
Copy link
Contributor Author

Jenkins, test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15134/

This is a little hacky, but minimizes the changes in SparkSubmit
so that there are fewer things we need to reason about.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15135/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15136/

@tdas
Copy link
Contributor

tdas commented May 23, 2014

Tested it in a standalone cluster, confirmed that --jars works with this change. And so does ADD_JARS. Merging this. Thanks @andrewor14

@asfgit asfgit closed this in 8edbee7 May 23, 2014
asfgit pushed a commit that referenced this pull request May 23, 2014
Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way.

Author: Andrew Or <[email protected]>

Closes #849 from andrewor14/shell-jars and squashes the following commits:

928a7e6 [Andrew Or] ',' -> "," (minor)
afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit
c6da113 [Andrew Or] Do not set spark.jars to ""
d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell

(cherry picked from commit 8edbee7)
Signed-off-by: Tathagata Das <[email protected]>
andrewor14 added a commit to andrewor14/spark that referenced this pull request May 23, 2014
asfgit pushed a commit that referenced this pull request May 25, 2014
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.

This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.

Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.

Author: Andrew Or <[email protected]>

Closes #853 from andrewor14/submit-paths and squashes the following commits:

0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
asfgit pushed a commit that referenced this pull request May 25, 2014
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.

This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.

Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.

Author: Andrew Or <[email protected]>

Closes #853 from andrewor14/submit-paths and squashes the following commits:

0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly

(cherry picked from commit 5081a0a)
Signed-off-by: Tathagata Das <[email protected]>
@andrewor14 andrewor14 deleted the shell-jars branch June 9, 2014 17:47
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way.

Author: Andrew Or <[email protected]>

Closes apache#849 from andrewor14/shell-jars and squashes the following commits:

928a7e6 [Andrew Or] ',' -> "," (minor)
afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit
c6da113 [Andrew Or] Do not set spark.jars to ""
d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.

This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.

Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.

Author: Andrew Or <[email protected]>

Closes apache#853 from andrewor14/submit-paths and squashes the following commits:

0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after apache#849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
wangyum pushed a commit that referenced this pull request May 26, 2023
* [CARMEL-4737] add lock for compact command

* enable compact transparency for delta

* support recover Partition lock

* fix

* fix style

* Avoid conflict with convertToDelta

* fix

* fix for delta

* double check leaffiles when get empty list

* fix bug

* fix ut

* backup compact folder

* should not check signature again when leaffiles are empty

Co-authored-by: fenzhu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants