-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14963][Yarn] Using recoveryPath if NM recovery is enabled #12994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #58116 has finished for PR 12994 at commit
|
| } | ||
|
|
||
| /** | ||
| * Get the recovery path, this will override the default one to get the our own maintained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "to get our" remove "the"
|
few minor comments but mostly looks good. Did you build against both hadoop 2.5+ and hadoop < 2.5? Did you manually test the upgrade path? |
|
Thanks @tgravescs for your comments, I will change the code and do a more comprehensive test accordingly. |
|
@tgravescs , I tested locally using Hadoop 2.4 and 2.6 with different scenarios:
Looks fine in all these scenarios. One missing part is do we need to take care of downgrade scenarios, like 2.6 to 2.4 or NM recovery enabled to disabled? |
|
Test build #58201 has finished for PR 12994 at commit
|
|
Test build #58205 has finished for PR 12994 at commit
|
|
Jenkins, retest this please. |
|
Test build #58207 has finished for PR 12994 at commit
|
|
Test build #58210 has finished for PR 12994 at commit
|
|
Test build #58216 has finished for PR 12994 at commit
|
|
I'm not concerned with the downgrade case. It just won't find the file if yarn isn't setting the recovery path any longer (it will create new one in localdir) , but I don't see that as a big issue because if someone is downgrading their cluster or turned off recovery they should kill everything that is running. |
|
+1 Thanks @jerryshao |
What changes were proposed in this pull request?
From Hadoop 2.5+, Yarn NM supports NM recovery which using recovery path for auxiliary services such as spark_shuffle, mapreduce_shuffle. So here change to use this path install of NM local dir if NM recovery is enabled.
How was this patch tested?
Unit test + local test.