Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

From Hadoop 2.5+, Yarn NM supports NM recovery which using recovery path for auxiliary services such as spark_shuffle, mapreduce_shuffle. So here change to use this path install of NM local dir if NM recovery is enabled.

How was this patch tested?

Unit test + local test.

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58116 has finished for PR 12994 at commit 08557bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

/**
* Get the recovery path, this will override the default one to get the our own maintained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "to get our" remove "the"

@tgravescs
Copy link
Contributor

few minor comments but mostly looks good. Did you build against both hadoop 2.5+ and hadoop < 2.5?

Did you manually test the upgrade path?

@jerryshao
Copy link
Contributor Author

Thanks @tgravescs for your comments, I will change the code and do a more comprehensive test accordingly.

@jerryshao
Copy link
Contributor Author

@tgravescs , I tested locally using Hadoop 2.4 and 2.6 with different scenarios:

  1. Only Hadoop 2.4
  2. Hadoop 2.4 upgrade to 2.6 with NM recovery disabled.
  3. Hadoop 2.4 upgrade to 2.6 with NM recovery enabled.
  4. Hadoop 2.6 NM recovery disabled to enabled.

Looks fine in all these scenarios.

One missing part is do we need to take care of downgrade scenarios, like 2.6 to 2.4 or NM recovery enabled to disabled?

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58201 has finished for PR 12994 at commit 4e5c2fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58205 has finished for PR 12994 at commit 519bf07.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58207 has finished for PR 12994 at commit 519bf07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58210 has finished for PR 12994 at commit 02752c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58216 has finished for PR 12994 at commit 6d4a8f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

I'm not concerned with the downgrade case. It just won't find the file if yarn isn't setting the recovery path any longer (it will create new one in localdir) , but I don't see that as a big issue because if someone is downgrading their cluster or turned off recovery they should kill everything that is running.

@tgravescs
Copy link
Contributor

+1 Thanks @jerryshao

@asfgit asfgit closed this in aab99d3 May 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants