[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084

xkrogen · 2021-09-23T23:14:15Z

What changes were proposed in this pull request?

Add environment variable resolution logic to yarn.Client.getUserClasspathUrls, which allows for users to specify JAR paths (e.g. from spark.jars) which contain references to environment variables.

This is a best-effort attempt to mimic the variable resolution logic used by a typical shell (implemented in yarn.Client.replaceEnvVars).

Why are the changes needed?

In PR #32810 the way user JAR classpaths were passed around was changed to avoid passing them via the command line, which is prone to exceeding maximum argument length limitations. However, as a result, the classpaths are no longer interpreted by the shell, so environment variables are not resolved. This is explicitly called out in the docs of spark.yarn.config.gatewayPath as a use case, so we definitely need to continue supporting it. There are more details in the comments of SPARK-36572.

Does this PR introduce any user-facing change?

Yes, using environment variables in spark.jars or spark.yarn.config.replacementPath will work again, as it did before PR #32810.

How was this patch tested?

New unit tests added for this specific case in YarnClusterSuite.

… lists

xkrogen · 2021-09-23T23:21:07Z

@peter-toth @tgravescs @mridulm FYI

It's not super clean to have to perform the variable resolution in Spark code, and in particular one aspect I left out is handling of escaping, so you can't use e.g. /home/\$usr/ to actually refer to the path /home/$usr -- usr will be resolved as a variable. Similarly on Windows, \Home\%%usr\ wouldn't work as expected, resolving to \Home\usr instead of \Home\%usr. Handling the escapes is a lot trickier than just doing variable substitution and I'm not sure it would be a good idea to try to handle it via simple regexes as I'm doing for the substitution now. But I think the current logic should cover real-world use cases. Open to other ideas here, though. One option would be to spin up an external shell process to do the resolution, but it seems messy and could be quite slow for large lists.

The other approach would be to hide this behind a feature flag, but it's kind of a weird feature flag. "Turn on scalable user JAR handling to bypass argument length limits, but also turn off environment variable substitution." I think it will be confusing for users so I'd prefer to avoid it if possible.

SparkQA · 2021-09-23T23:21:29Z

Test build #143572 has finished for PR 34084 at commit e36814b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-09-23T23:34:08Z

Test build #143573 has finished for PR 34084 at commit 341a2d3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-09-23T23:58:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48081/

SparkQA · 2021-09-24T00:11:17Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48082/

SparkQA · 2021-09-24T00:43:49Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48081/

SparkQA · 2021-09-24T01:10:41Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48082/

tgravescs · 2021-09-29T13:47:43Z

this can be closed since we have #34120, correct?

xkrogen · 2021-09-29T15:55:46Z

Yes, thanks for the reminder @tgravescs !

…ing config instead of command line ### What changes were proposed in this pull request? Refactor the logic for constructing the user classpath from `yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method, `getUserClassPath`, is added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent with the existing behavior where non-YARN resource managers do not configure the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` configs. Within `yarn.Client`, environment variables in the configured paths are resolved before constructing the classpath. Please note that this is a re-submission of #32810, which was reverted in #34082 due to the issues described in [this comment](https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17419285). This PR additionally includes the changes described in #34084 to resolve the issue, though this PR has been enhanced to properly handle escape strings, unlike #34084. ### Why are the changes needed? User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in `ExecutorRunnable`), which then passes the information to the executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the executor command line as `--user-class-path /path/to/myjar.jar`. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message: > /bin/bash: Argument list too long A [Google search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22) indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue. ### Does this PR introduce _any_ user-facing change? There is one small behavioral change which is a bug fix. Previously the `spark.yarn.config.gatewayPath` and `spark.yarn.config.replacementPath` options were only applied to executors, meaning they would not work for the driver when running in cluster mode. This appears to be a bug; the [documentation for this functionality](https://spark.apache.org/docs/latest/running-on-yarn.html) does not mention any limitations that this is only for executors. This PR fixes that issue. Additionally, this fixes the main bash argument length issue, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before, and substitution of environment variables in `spark.jars` or `spark.yarn.config.replacementPath` works as expected. ### How was this patch tested? New unit tests were added in `YarnClusterSuite`. Also, we have been running a similar fix internally for 4 months with great success. Closes #34120 from xkrogen/xkrogen-SPARK-35672-yarn-classpath-list-take2. Authored-by: Erik Krogen <[email protected]> Signed-off-by: attilapiros <[email protected]>

xkrogen added 2 commits September 23, 2021 16:06

SPARK-35672 Handle environment variable replacement in user classpath…

a7670ad

… lists

expand the scope of replacement to not only cluster paths

e36814b

github-actions bot added the YARN label Sep 23, 2021

xkrogen changed the title ~~SPARK-35672 Handle environment variable replacement in user classpath lists~~ [SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists Sep 23, 2021

xkrogen mentioned this pull request Sep 23, 2021

Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line" #34082

Closed

fix style issue (hopefully)

341a2d3

xkrogen mentioned this pull request Sep 27, 2021

[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line. #34120

Closed

xkrogen closed this Sep 29, 2021

xkrogen deleted the xkrogen-SPARK-35672-yarn-classpath-list branch September 29, 2021 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084

[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084

Uh oh!

xkrogen commented Sep 23, 2021

Uh oh!

xkrogen commented Sep 23, 2021 •

edited

Loading

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

tgravescs commented Sep 29, 2021

Uh oh!

xkrogen commented Sep 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084

[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084

Uh oh!

Conversation

xkrogen commented Sep 23, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xkrogen commented Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 23, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

SparkQA commented Sep 24, 2021

Uh oh!

tgravescs commented Sep 29, 2021

Uh oh!

xkrogen commented Sep 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xkrogen commented Sep 23, 2021 •

edited

Loading