Skip to content

Conversation

@wang-zhun
Copy link
Contributor

@wang-zhun wang-zhun commented May 6, 2020

What changes were proposed in this pull request?

Update the input parameters for instantiating RMAppManager and ClientRMService

Why are the changes needed?

For hadoop3.2, if RMAppManager is not created correctly, the following exception will occur:

java.lang.RuntimeException: java.lang.NullPointerException
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
	at org.apache.hadoop.yarn.security.YarnAuthorizationProvider.getInstance(YarnAuthorizationProvider.java:55)
	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.<init>(RMAppManager.java:117)

How was this patch tested?

UTs

@wang-zhun
Copy link
Contributor Author

test this please

@viirya
Copy link
Member

viirya commented May 6, 2020

ok to test

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122330 has finished for PR 28456 at commit 1e415a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122331 has finished for PR 28456 at commit 42f090c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122337 has finished for PR 28456 at commit fd08a7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122361 has finished for PR 28456 at commit 135ac1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wang-zhun
Copy link
Contributor Author

wang-zhun commented May 6, 2020

@viirya @tgravescs How to specify "Hadoop profile hadoop3.2" to perform the test

@tgravescs tgravescs changed the title [SPARK-31235][YARN] Fix test "specify a more specific type for the ap… [SPARK-31235][YARN][test-hadoop3.2] Fix test "specify a more specific type for the ap… May 6, 2020
@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122365 has finished for PR 28456 at commit 135ac1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented May 7, 2020

Test build #122408 has finished for PR 28456 at commit 135ac1f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31235][YARN][test-hadoop3.2] Fix test "specify a more specific type for the ap… [SPARK-31235][FOLLOWUP][TESTS][test-hadoop3.2] Fix test "specify a more specific type for the ap… May 7, 2020
@SparkQA
Copy link

SparkQA commented May 7, 2020

Test build #122412 has finished for PR 28456 at commit 135ac1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @tgravescs . Could you review this PR?

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look fine to me, I was running the tests multiple times since this seemed to be intermittent failure

@tgravescs
Copy link
Contributor

merging this to master

@asfgit asfgit closed this in c1801fd May 8, 2020
@dongjoon-hyun
Copy link
Member

Thank you so much, @tgravescs !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 14, 2020

Ur, in Maven environment, this seems to cause more UT failures in the other suite.

It seems that this have a side effect.

Test Result (11 failures / +7)
org.apache.spark.deploy.yarn.YarnClusterSuite.run Spark in yarn-client mode
org.apache.spark.deploy.yarn.YarnClusterSuite.run Spark in yarn-cluster mode with different configurations, ensuring redaction
org.apache.spark.deploy.yarn.YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630)
org.apache.spark.deploy.yarn.YarnClusterSuite.run Spark in yarn-cluster mode with additional jar
org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in yarn-cluster mode
org.apache.spark.deploy.yarn.YarnClusterSuite.run Python application in yarn-cluster mode using spark.yarn.appMasterEnv to override local envvar
org.apache.spark.deploy.yarn.YarnClusterSuite.user class path first in cluster mode
org.apache.spark.deploy.yarn.YarnClusterSuite.executor env overwrite AM env in cluster mode
org.apache.spark.deploy.yarn.YarnShuffleAuthSuite.external shuffle service
org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite.external shuffle service

@dongjoon-hyun
Copy link
Member

Although this is the first one hitting them, this may not be the root cause.

@wang-zhun
Copy link
Contributor Author

Thank you @dongjoon-hyun , it may be caused by me, I will investigate this reason

@dongjoon-hyun
Copy link
Member

Thank you so much, @wang-zhun .

@dongjoon-hyun
Copy link
Member

Unfortunately, this test case follow-up broadens the failure surface to the other test suites, YarnClusterSuite, YarnShuffleAuthSuite, and YarnShuffleIntegrationSuite.

Definitely, this seems that some uncleaned YarnClient issue, but it's not clear to me. I'll revert this first to narrow down the failure this specific test case only.

After that, if the original test case, specify a more specific type for ... , is not fixed in a short term, we may consider another approaches (ignoring this in Hadoop 3.2 or reverting the original commit).

Sorry guys.

@dongjoon-hyun
Copy link
Member

For the follow-up PR, please use [test-maven][test-hadoop3.2] in the PR title, @wang-zhun .

@tgravescs
Copy link
Contributor

tgravescs commented May 15, 2020

that's fine, if its causing problems lets revert the test. We can do some other more basic tests just to make sure value passed as expected and perhaps add something in the Cluster Suite

@dongjoon-hyun
Copy link
Member

Hi, All. I made a follow-up PR. Until now, it looks promising.

@dongjoon-hyun
Copy link
Member

Inevitably, the test case will be disabled temporarily on Hadoop 3.2 profile only. We will keep this test coverage in Hadoop 2.7.

dongjoon-hyun added a commit that referenced this pull request May 16, 2020
…ific type` in Hadoop-3.2

### What changes were proposed in this pull request?

This PR aims to recover Hadoop-3.2 profile jobs on `master` branch by disabling a UT added by SPARK-31235 in Hadoop 3.2 temporarily. The target UT is not a flaky test. It always fail on Hadoop-3.2 profile currently although it works in Hadoop 2.7 profile. So, in this PR, we keep the test coverage in Hadoop 2.7 and ignore the test in Hadoop 3.2 temporarily to unblock the other PRs.

### Why are the changes needed?

SPARK-31235 added a test case which is breaking Hadoop 3.2 and there are two follow-up to fix it. Although two follow-ups can fix the UT in Hadoop 3.2 environment. The side-effect on Hadoop classes cause some random UT failures in the other suites.
- #28456
- #28550

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with SBT/Maven.

Closes #28552 from dongjoon-hyun/SPARK-31235-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@wang-zhun
Copy link
Contributor Author

Thank you @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants