-
Notifications
You must be signed in to change notification settings - Fork 9.2k
YARN-10468. Fix TestNodeStatusUpdater timeouts and broken conditions #2461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this with GenericTestUtils.waitFor() right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goiri, I added the nm.getServiceState() == STATE.STARTED because I found that the Unit test could keep waiting for the heartBeatID even after the nm fails.
The loop can be replaced with waitFor() but the conditions has to be rewritten in a way that may be confusing a little bit.
// we should not be waiting once the service stops running
GenericTestUtils.waitFor( () -> (nm.getServiceState() != STATE.STARTED) || heartBeatID > 3), 50, 1000);If you are fine with the above version, I can replace all the loops with WaitFor()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split it into two lines but I think the or condition is clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitFor()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitFor()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitFor()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
7f7b8ab to
b37e863
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
b66e55b to
51db2bb
Compare
|
Hi @goiri , |
|
🎊 +1 overall
This message was automatically generated. |
This PR tries to address several problems in the test class:
heartBeatID is volatile which does not guarantee atomic updates. This has been replaced with AtomicInteger.
Loop on heartBeatID values, was not checking whether the NM has failed. In that case, the test case will keep looping using resources and CPU in vain. This has been replaced by checks on the NM service status.
NM was closed twice: Once in the test method and a second time in the tearDown()
Several tests did not wait for the NM to start after calling nm.start()
stopping a service did not properly wait until the service is completely shutdown.
NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute