YARN-10468. Fix TestNodeStatusUpdater timeouts and broken conditions #2461

amahussein · 2020-11-12T06:58:58Z

This PR tries to address several problems in the test class:

In case of nodeManager failure, some loops will keep looping for ever without timeouts. The following examples was replaced by a limit that has a timeouts.

while(heartBeadID < 12) {
  Thread.Sleep();
}

heartBeatID is volatile which does not guarantee atomic updates. This has been replaced with AtomicInteger.
Loop on heartBeatID values, was not checking whether the NM has failed. In that case, the test case will keep looping using resources and CPU in vain. This has been replaced by checks on the NM service status.
NM was closed twice: Once in the test method and a second time in the tearDown()
Several tests did not wait for the NM to start after calling nm.start()
stopping a service did not properly wait until the service is completely shutdown.

NOTICE

Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute

hadoop-yetus · 2020-11-12T08:57:36Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 47s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	39m 33s		trunk passed
+1 💚	compile	1m 29s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	compile	1m 21s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	checkstyle	0m 35s		trunk passed
+1 💚	mvnsite	0m 49s		trunk passed
+1 💚	shadedclient	19m 59s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 44s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 46s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+0 🆗	spotbugs	1m 49s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 40s		trunk passed
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 43s		the patch passed
+1 💚	compile	1m 24s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javac	1m 24s		the patch passed
+1 💚	compile	1m 18s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	javac	1m 18s		the patch passed
-0 ⚠️	checkstyle	0m 26s	/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 95 unchanged - 2 fixed = 98 total (was 97)
+1 💚	mvnsite	0m 41s		the patch passed
+1 💚	whitespace	0m 0s		The patch has no whitespace issues.
+1 💚	shadedclient	17m 3s		patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 32s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 29s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	findbugs	1m 23s		the patch passed
			_ Other Tests _
+1 💚	unit	22m 32s		hadoop-yarn-server-nodemanager in the patch passed.
+1 💚	asflicense	0m 32s		The patch does not generate ASF License warnings.
		117m 24s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/1/artifact/out/Dockerfile
GITHUB PR	#2461
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux eaced1916b4e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `fc961b6`
Default Java	Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/1/testReport/
Max. process+thread count	659 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/1/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

goiri · 2020-11-12T17:57:25Z

...demanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java

We can do this with GenericTestUtils.waitFor() right?

Thanks @goiri, I added the nm.getServiceState() == STATE.STARTED because I found that the Unit test could keep waiting for the heartBeatID even after the nm fails.
The loop can be replaced with waitFor() but the conditions has to be rewritten in a way that may be confusing a little bit.

// we should not be waiting once the service stops running GenericTestUtils.waitFor( () -> (nm.getServiceState() != STATE.STARTED) || heartBeatID > 3), 50, 1000);

If you are fine with the above version, I can replace all the loops with WaitFor()

I would split it into two lines but I think the or condition is clear enough.

goiri · 2020-11-12T17:58:28Z

...demanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java

same as above

goiri · 2020-11-12T17:58:34Z

...demanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java

Same as above

goiri · 2020-11-12T17:59:07Z

...demanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java

same as above

hadoop-yetus · 2020-11-12T20:10:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	38m 14s		trunk passed
+1 💚	compile	1m 18s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	compile	1m 10s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	checkstyle	0m 29s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	shadedclient	19m 21s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 35s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 30s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+0 🆗	spotbugs	1m 33s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 27s		trunk passed
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 43s		the patch passed
+1 💚	compile	1m 17s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javac	1m 17s		the patch passed
+1 💚	compile	1m 5s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	javac	1m 5s		the patch passed
-0 ⚠️	checkstyle	0m 24s	/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 94 unchanged - 3 fixed = 95 total (was 97)
+1 💚	mvnsite	0m 41s		the patch passed
+1 💚	whitespace	0m 0s		The patch has no whitespace issues.
+1 💚	shadedclient	17m 33s		patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 34s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 31s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	findbugs	1m 44s		the patch passed
			_ Other Tests _
+1 💚	unit	22m 16s		hadoop-yarn-server-nodemanager in the patch passed.
+1 💚	asflicense	0m 29s		The patch does not generate ASF License warnings.
		114m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/2/artifact/out/Dockerfile
GITHUB PR	#2461
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 71d0ff64e91e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `fc961b6`
Default Java	Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/2/testReport/
Max. process+thread count	511 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/2/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2020-11-12T22:47:18Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 36s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	39m 8s		trunk passed
+1 💚	compile	1m 24s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	compile	1m 17s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 46s		trunk passed
+1 💚	shadedclient	18m 59s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 33s		trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 28s		trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+0 🆗	spotbugs	1m 28s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 26s		trunk passed
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 37s		the patch passed
+1 💚	compile	1m 8s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javac	1m 8s		the patch passed
+1 💚	compile	1m 2s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	javac	1m 2s		the patch passed
+1 💚	checkstyle	0m 22s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 86 unchanged - 11 fixed = 86 total (was 97)
+1 💚	mvnsite	0m 34s		the patch passed
+1 💚	whitespace	0m 0s		The patch has no whitespace issues.
+1 💚	shadedclient	16m 57s		patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 32s		the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1
+1 💚	javadoc	0m 27s		the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
+1 💚	findbugs	1m 38s		the patch passed
			_ Other Tests _
+1 💚	unit	22m 40s		hadoop-yarn-server-nodemanager in the patch passed.
+1 💚	asflicense	0m 30s		The patch does not generate ASF License warnings.
		114m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/3/artifact/out/Dockerfile
GITHUB PR	#2461
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 2d37d82cd233 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `5ce1810`
Default Java	Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/3/testReport/
Max. process+thread count	552 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/3/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

amahussein · 2020-11-24T16:35:34Z

Hi @goiri ,
I rebased the branch. I believe this change is ready to be merged if you are okay with it.

hadoop-yetus · 2020-11-24T19:03:05Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	39m 26s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	36m 54s		trunk passed
+1 💚	compile	1m 12s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	compile	1m 5s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	shadedclient	18m 4s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 32s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	javadoc	0m 29s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗	spotbugs	1m 23s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 21s		trunk passed
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 36s		the patch passed
+1 💚	compile	1m 8s		the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	javac	1m 8s		the patch passed
+1 💚	compile	1m 0s		the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚	javac	1m 0s		the patch passed
+1 💚	checkstyle	0m 23s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 86 unchanged - 11 fixed = 86 total (was 97)
+1 💚	mvnsite	0m 34s		the patch passed
+1 💚	whitespace	0m 0s		The patch has no whitespace issues.
+1 💚	shadedclient	16m 45s		patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚	javadoc	0m 26s		the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚	findbugs	1m 23s		the patch passed
			_ Other Tests _
+1 💚	unit	22m 3s		hadoop-yarn-server-nodemanager in the patch passed.
+1 💚	asflicense	0m 30s		The patch does not generate ASF License warnings.
		147m 31s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/4/artifact/out/Dockerfile
GITHUB PR	#2461
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 418ea8ab589a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `f813f14`
Default Java	Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/4/testReport/
Max. process+thread count	590 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2461/4/console
versions	git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

goiri approved these changes Nov 12, 2020

View reviewed changes

amahussein force-pushed the yarn-10468 branch from 7f7b8ab to b37e863 Compare November 12, 2020 18:15

amahussein added 2 commits November 24, 2020 10:33

YARN-10468. Fix TestNodeStatusUpdater timeouts and broken conditions

bffc68c

YARN-10468. use WaitFor throughout the test

51db2bb

amahussein force-pushed the yarn-10468 branch from b66e55b to 51db2bb Compare November 24, 2020 16:34

goiri merged commit 569b20e into apache:trunk Nov 24, 2020

YARN-10468. Fix TestNodeStatusUpdater timeouts and broken conditions #2461

YARN-10468. Fix TestNodeStatusUpdater timeouts and broken conditions #2461

Uh oh!

Conversation

amahussein commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NOTICE

Uh oh!

hadoop-yetus commented Nov 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Nov 12, 2020

Uh oh!

hadoop-yetus commented Nov 12, 2020

Uh oh!

amahussein commented Nov 24, 2020

Uh oh!

hadoop-yetus commented Nov 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amahussein commented Nov 12, 2020 •

edited

Loading