-
Notifications
You must be signed in to change notification settings - Fork 29k
Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line" #34082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…utors using config instead of command line" This reverts commit 866df69.
|
cc @xkrogen, @tgravescs |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a clean revert. Would be great to get @xkrogen to comment
|
Note we will need to revert from the spark 3.1 branch as well. |
|
Test build #143562 has finished for PR 34082 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Thanks for reporting this @peter-toth ! cc @mridulm as well. |
I'm open to other kind of fixes, just wanted to unblock RCs quickly with this revert... |
|
Ack, let me see if I can get at least a POC fix-forward up by today, and if we think it looks good we can continue with that approach. If I'm not able to get something up by today, we revert to unblock the RC. Sound okay? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143563 has finished for PR 34082 at commit
|
|
Put up #34084, PTAL folks. |
|
oh |
|
yeah, let's revert it first. |
|
cc @gengliangwang . I think it has to go through up to branch-3.1. EDIT: just saw #34082 (comment) :-). |
|
Merged to master. |
|
@peter-toth, it has a conflict on branch-3.2. It seems trivial but I think we should better make a PR since we will release right away. mind creating a backporting Pr please? |
|
Yeah let's revert it now and fix it later. |
|
I created a backport PR in #34088 |
|
Thx! |
…ing config instead of command line ### What changes were proposed in this pull request? Refactor the logic for constructing the user classpath from `yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method, `getUserClassPath`, is added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent with the existing behavior where non-YARN resource managers do not configure the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` configs. Within `yarn.Client`, environment variables in the configured paths are resolved before constructing the classpath. Please note that this is a re-submission of #32810, which was reverted in #34082 due to the issues described in [this comment](https://issues.apache.org/jira/browse/SPARK-35672?focusedCommentId=17419285&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17419285). This PR additionally includes the changes described in #34084 to resolve the issue, though this PR has been enhanced to properly handle escape strings, unlike #34084. ### Why are the changes needed? User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in `ExecutorRunnable`), which then passes the information to the executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the executor command line as `--user-class-path /path/to/myjar.jar`. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message: > /bin/bash: Argument list too long A [Google search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22) indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue. ### Does this PR introduce _any_ user-facing change? There is one small behavioral change which is a bug fix. Previously the `spark.yarn.config.gatewayPath` and `spark.yarn.config.replacementPath` options were only applied to executors, meaning they would not work for the driver when running in cluster mode. This appears to be a bug; the [documentation for this functionality](https://spark.apache.org/docs/latest/running-on-yarn.html) does not mention any limitations that this is only for executors. This PR fixes that issue. Additionally, this fixes the main bash argument length issue, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before, and substitution of environment variables in `spark.jars` or `spark.yarn.config.replacementPath` works as expected. ### How was this patch tested? New unit tests were added in `YarnClusterSuite`. Also, we have been running a similar fix internally for 4 months with great success. Closes #34120 from xkrogen/xkrogen-SPARK-35672-yarn-classpath-list-take2. Authored-by: Erik Krogen <[email protected]> Signed-off-by: attilapiros <[email protected]>
What changes were proposed in this pull request?
This reverts commit 866df69.
Why are the changes needed?
After the change environment variables were not substituted in user classpath entries. Please find an example on SPARK-35672.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.