Skip to content

Conversation

@abellina
Copy link
Contributor

@abellina abellina commented Feb 15, 2019

What changes were proposed in this pull request?

prepareSubmitEnvironment performs globbing that will fail in the case where a proxy user (--proxy-user) doesn't have permission to the file. This is a bug also with 2.3, so we should backport, as currently you can't launch an application that for instance is passing a file under --archives, and that file is owned by the target user.

The solution is to call prepareSubmitEnvironment within a doAs context if proxying.

How was this patch tested?

Manual tests running with --proxy-user and --archives, before and after, showing that the globbing is successful when the resource is owned by the target user.

I've looked at writing unit tests, but I am not sure I can do that cleanly (perhaps with a custom FileSystem). Open to ideas.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@abellina abellina force-pushed the SPARK-26895_prepareSubmitEnvironment_from_doAs branch from 71f8f26 to f082fc6 Compare February 16, 2019 03:54
@abellina abellina changed the title SPARK-26895: prepareSubmitEnvironment should be called within doAs fo… [SPARK-26895][CORE]: prepareSubmitEnvironment should be called within doAs fo… Feb 16, 2019
@abellina abellina changed the title [SPARK-26895][CORE]: prepareSubmitEnvironment should be called within doAs fo… [SPARK-26895][CORE] prepareSubmitEnvironment should be called within doAs fo… Feb 16, 2019
@abellina abellina marked this pull request as ready for review February 16, 2019 20:18
@abellina
Copy link
Contributor Author

abellina commented Feb 16, 2019

@jerryshao pinging you as the feature was introduced here: #18235.

Also pinging reviewers @vanzin, @cloud-fan, @jiangxb1987. Thanks in advance.

@abellina
Copy link
Contributor Author

ok to test

@abellina abellina changed the title [SPARK-26895][CORE] prepareSubmitEnvironment should be called within doAs fo… [SPARK-26895][CORE] proxy-users issue submitting applications with archives and other resources Feb 19, 2019
@vanzin
Copy link
Contributor

vanzin commented Feb 19, 2019

ok to test

@vanzin
Copy link
Contributor

vanzin commented Feb 19, 2019

BTW your previous PR title was better. Describe the fix, not the problem.

@abellina
Copy link
Contributor Author

@vanzin will update it. Thanks

@abellina abellina changed the title [SPARK-26895][CORE] proxy-users issue submitting applications with archives and other resources [SPARK-26895][CORE] prepareSubmitEnvironment should be called within doAs for proxy users Feb 19, 2019
@SparkQA
Copy link

SparkQA commented Feb 20, 2019

Test build #102518 has finished for PR 23806 at commit 224d489.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@abellina
Copy link
Contributor Author

abellina commented Feb 20, 2019

Hmm, python tests failed. Not sure if related though. The tests point at the /home/jenkins/workspace/SparkPullRequestBuilder/python/unit-tests.log file, but I can't get to that from jenkins UI it seems (are all prs writing to the same file?). Is this a known issue? Can we move it to the artifacts? (created https://issues.apache.org/jira/browse/SPARK-26944 for this)

From the error, it seems like the shmutil.rmtree function is trying to delete a directory, but there's likely another thread adding entries to a directory, so when it gets to os.rmdir(path) it blows up. I think the test should call q.awaitTermination after q.stop, before going on. (filed this one for the streaming issue https://issues.apache.org/jira/browse/SPARK-26945)

ERROR: test_query_manager_await_termination (pyspark.sql.tests.test_streaming.StreamingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests/test_streaming.py", line 259, in test_query_manager_await_termination
    shutil.rmtree(tmpPath)
  File "/home/anaconda/lib/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/home/anaconda/lib/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/jenkins/workspace/SparkPullRequestBuilder/python/target/072153bd-f981-47be-bda2-e2b657a16f65/tmp4WGp7n'

@abellina
Copy link
Contributor Author

ok to test

@vanzin
Copy link
Contributor

vanzin commented Feb 21, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Feb 21, 2019

Test build #102590 has finished for PR 23806 at commit 224d489.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Feb 22, 2019

Merging to master.

@vanzin vanzin closed this in 79a6504 Feb 22, 2019
vanzin pushed a commit that referenced this pull request Feb 28, 2019
…tEnvironment` in SparkSubmit

## What changes were proposed in this pull request?

Currently, if I run `spark-shell` in my local, it started to show the logs as below:

```
$ ./bin/spark-shell
...
19/02/28 04:42:43 INFO SecurityManager: Changing view acls to: hkwon
19/02/28 04:42:43 INFO SecurityManager: Changing modify acls to: hkwon
19/02/28 04:42:43 INFO SecurityManager: Changing view acls groups to:
19/02/28 04:42:43 INFO SecurityManager: Changing modify acls groups to:
19/02/28 04:42:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hkwon); groups with view permissions: Set(); users  with modify permissions: Set(hkwon); groups with modify permissions: Set()
19/02/28 04:42:43 INFO SignalUtils: Registered signal handler for INT
19/02/28 04:42:48 INFO SparkContext: Running Spark version 3.0.0-SNAPSHOT
19/02/28 04:42:48 INFO SparkContext: Submitted application: Spark shell
19/02/28 04:42:48 INFO SecurityManager: Changing view acls to: hkwon
```

Seems to be the cause is #23806 and `prepareSubmitEnvironment` looks actually reinitializing the logging again.

This PR proposes to uninitializing log later after `prepareSubmitEnvironment`.

## How was this patch tested?

Manually tested.

Closes #23911 from HyukjinKwon/SPARK-26895.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
vanzin pushed a commit to vanzin/spark that referenced this pull request Aug 20, 2019
…lled within doAs for proxy users

`prepareSubmitEnvironment` performs globbing that will fail in the case where a proxy user (`--proxy-user`) doesn't have permission to the file. This is a bug also with 2.3, so we should backport, as currently you can't launch an application that for instance is passing a file under `--archives`, and that file is owned by the target user.

The solution is to call `prepareSubmitEnvironment` within a doAs context if proxying.

Manual tests running with `--proxy-user` and `--archives`, before and after, showing that the globbing is successful when the resource is owned by the target user.

I've looked at writing unit tests, but I am not sure I can do that cleanly (perhaps with a custom FileSystem). Open to ideas.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#23806 from abellina/SPARK-26895_prepareSubmitEnvironment_from_doAs.

Lead-authored-by: Alessandro Bellina <[email protected]>
Co-authored-by: Alessandro Bellina <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 79a6504)
Signed-off-by: Marcelo Vanzin <[email protected]>
vanzin pushed a commit to vanzin/spark that referenced this pull request Aug 21, 2019
…tEnvironment` in SparkSubmit

Currently, if I run `spark-shell` in my local, it started to show the logs as below:

```
$ ./bin/spark-shell
...
19/02/28 04:42:43 INFO SecurityManager: Changing view acls to: hkwon
19/02/28 04:42:43 INFO SecurityManager: Changing modify acls to: hkwon
19/02/28 04:42:43 INFO SecurityManager: Changing view acls groups to:
19/02/28 04:42:43 INFO SecurityManager: Changing modify acls groups to:
19/02/28 04:42:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hkwon); groups with view permissions: Set(); users  with modify permissions: Set(hkwon); groups with modify permissions: Set()
19/02/28 04:42:43 INFO SignalUtils: Registered signal handler for INT
19/02/28 04:42:48 INFO SparkContext: Running Spark version 3.0.0-SNAPSHOT
19/02/28 04:42:48 INFO SparkContext: Submitted application: Spark shell
19/02/28 04:42:48 INFO SecurityManager: Changing view acls to: hkwon
```

Seems to be the cause is apache#23806 and `prepareSubmitEnvironment` looks actually reinitializing the logging again.

This PR proposes to uninitializing log later after `prepareSubmitEnvironment`.

Manually tested.

Closes apache#23911 from HyukjinKwon/SPARK-26895.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 6e31ccf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants