Skip to content

Conversation

@xkrogen
Copy link
Contributor

@xkrogen xkrogen commented Sep 23, 2021

What changes were proposed in this pull request?

Add environment variable resolution logic to yarn.Client.getUserClasspathUrls, which allows for users to specify JAR paths (e.g. from spark.jars) which contain references to environment variables.

This is a best-effort attempt to mimic the variable resolution logic used by a typical shell (implemented in yarn.Client.replaceEnvVars).

Why are the changes needed?

In PR #32810 the way user JAR classpaths were passed around was changed to avoid passing them via the command line, which is prone to exceeding maximum argument length limitations. However, as a result, the classpaths are no longer interpreted by the shell, so environment variables are not resolved. This is explicitly called out in the docs of spark.yarn.config.gatewayPath as a use case, so we definitely need to continue supporting it. There are more details in the comments of SPARK-36572.

Does this PR introduce any user-facing change?

Yes, using environment variables in spark.jars or spark.yarn.config.replacementPath will work again, as it did before PR #32810.

How was this patch tested?

New unit tests added for this specific case in YarnClusterSuite.

@github-actions github-actions bot added the YARN label Sep 23, 2021
@xkrogen xkrogen changed the title SPARK-35672 Handle environment variable replacement in user classpath lists [SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists Sep 23, 2021
@xkrogen
Copy link
Contributor Author

xkrogen commented Sep 23, 2021

@peter-toth @tgravescs @mridulm FYI

It's not super clean to have to perform the variable resolution in Spark code, and in particular one aspect I left out is handling of escaping, so you can't use e.g. /home/\$usr/ to actually refer to the path /home/$usr -- usr will be resolved as a variable. Similarly on Windows, \Home\%%usr\ wouldn't work as expected, resolving to \Home\usr instead of \Home\%usr. Handling the escapes is a lot trickier than just doing variable substitution and I'm not sure it would be a good idea to try to handle it via simple regexes as I'm doing for the substitution now. But I think the current logic should cover real-world use cases. Open to other ideas here, though. One option would be to spin up an external shell process to do the resolution, but it seems messy and could be quite slow for large lists.

The other approach would be to hide this behind a feature flag, but it's kind of a weird feature flag. "Turn on scalable user JAR handling to bypass argument length limits, but also turn off environment variable substitution." I think it will be confusing for users so I'd prefer to avoid it if possible.

@SparkQA
Copy link

SparkQA commented Sep 23, 2021

Test build #143572 has finished for PR 34084 at commit e36814b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2021

Test build #143573 has finished for PR 34084 at commit 341a2d3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48081/

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48082/

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48081/

@SparkQA
Copy link

SparkQA commented Sep 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48082/

@tgravescs
Copy link
Contributor

this can be closed since we have #34120, correct?

@xkrogen xkrogen closed this Sep 29, 2021
@xkrogen
Copy link
Contributor Author

xkrogen commented Sep 29, 2021

Yes, thanks for the reminder @tgravescs !

@xkrogen xkrogen deleted the xkrogen-SPARK-35672-yarn-classpath-list branch September 29, 2021 15:55
attilapiros pushed a commit that referenced this pull request Nov 16, 2021
…ing config instead of command line

### What changes were proposed in this pull request?
Refactor the logic for constructing the user classpath from `yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method, `getUserClassPath`, is added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent with the existing behavior where non-YARN resource managers do not configure the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` configs. Within `yarn.Client`, environment variables in the configured paths are resolved before constructing the classpath.

Please note that this is a re-submission of #32810, which was reverted in #34082 due to the issues described in [this comment](https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17419285). This PR additionally includes the changes described in #34084 to resolve the issue, though this PR has been enhanced to properly handle escape strings, unlike #34084.

### Why are the changes needed?
User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in `ExecutorRunnable`), which then passes the information to the executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the executor command line as `--user-class-path /path/to/myjar.jar`. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message:

> /bin/bash: Argument list too long

A [Google search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22) indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue.

### Does this PR introduce _any_ user-facing change?
There is one small behavioral change which is a bug fix. Previously the `spark.yarn.config.gatewayPath` and `spark.yarn.config.replacementPath` options were only applied to executors, meaning they would not work for the driver when running in cluster mode. This appears to be a bug; the [documentation for this functionality](https://spark.apache.org/docs/latest/running-on-yarn.html) does not mention any limitations that this is only for executors. This PR fixes that issue.

Additionally, this fixes the main bash argument length issue, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before, and substitution of environment variables in `spark.jars` or `spark.yarn.config.replacementPath` works as expected.

### How was this patch tested?
New unit tests were added in `YarnClusterSuite`. Also, we have been running a similar fix internally for 4 months with great success.

Closes #34120 from xkrogen/xkrogen-SPARK-35672-yarn-classpath-list-take2.

Authored-by: Erik Krogen <[email protected]>
Signed-off-by: attilapiros <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants