-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35672][CORE][YARN] Handle environment variable replacement in user classpath lists #34084
Conversation
|
@peter-toth @tgravescs @mridulm FYI It's not super clean to have to perform the variable resolution in Spark code, and in particular one aspect I left out is handling of escaping, so you can't use e.g. The other approach would be to hide this behind a feature flag, but it's kind of a weird feature flag. "Turn on scalable user JAR handling to bypass argument length limits, but also turn off environment variable substitution." I think it will be confusing for users so I'd prefer to avoid it if possible. |
|
Test build #143572 has finished for PR 34084 at commit
|
|
Test build #143573 has finished for PR 34084 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
this can be closed since we have #34120, correct? |
|
Yes, thanks for the reminder @tgravescs ! |
…ing config instead of command line ### What changes were proposed in this pull request? Refactor the logic for constructing the user classpath from `yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method, `getUserClassPath`, is added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent with the existing behavior where non-YARN resource managers do not configure the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` configs. Within `yarn.Client`, environment variables in the configured paths are resolved before constructing the classpath. Please note that this is a re-submission of #32810, which was reverted in #34082 due to the issues described in [this comment](https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17419285). This PR additionally includes the changes described in #34084 to resolve the issue, though this PR has been enhanced to properly handle escape strings, unlike #34084. ### Why are the changes needed? User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in `ExecutorRunnable`), which then passes the information to the executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the executor command line as `--user-class-path /path/to/myjar.jar`. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message: > /bin/bash: Argument list too long A [Google search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22) indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue. ### Does this PR introduce _any_ user-facing change? There is one small behavioral change which is a bug fix. Previously the `spark.yarn.config.gatewayPath` and `spark.yarn.config.replacementPath` options were only applied to executors, meaning they would not work for the driver when running in cluster mode. This appears to be a bug; the [documentation for this functionality](https://spark.apache.org/docs/latest/running-on-yarn.html) does not mention any limitations that this is only for executors. This PR fixes that issue. Additionally, this fixes the main bash argument length issue, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before, and substitution of environment variables in `spark.jars` or `spark.yarn.config.replacementPath` works as expected. ### How was this patch tested? New unit tests were added in `YarnClusterSuite`. Also, we have been running a similar fix internally for 4 months with great success. Closes #34120 from xkrogen/xkrogen-SPARK-35672-yarn-classpath-list-take2. Authored-by: Erik Krogen <[email protected]> Signed-off-by: attilapiros <[email protected]>
What changes were proposed in this pull request?
Add environment variable resolution logic to
yarn.Client.getUserClasspathUrls, which allows for users to specify JAR paths (e.g. fromspark.jars) which contain references to environment variables.This is a best-effort attempt to mimic the variable resolution logic used by a typical shell (implemented in
yarn.Client.replaceEnvVars).Why are the changes needed?
In PR #32810 the way user JAR classpaths were passed around was changed to avoid passing them via the command line, which is prone to exceeding maximum argument length limitations. However, as a result, the classpaths are no longer interpreted by the shell, so environment variables are not resolved. This is explicitly called out in the docs of
spark.yarn.config.gatewayPathas a use case, so we definitely need to continue supporting it. There are more details in the comments of SPARK-36572.Does this PR introduce any user-facing change?
Yes, using environment variables in
spark.jarsorspark.yarn.config.replacementPathwill work again, as it did before PR #32810.How was this patch tested?
New unit tests added for this specific case in
YarnClusterSuite.