-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix ShrinkIndexIT #44214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ShrinkIndexIT #44214
Conversation
* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164
|
Pinging @elastic/es-distributed |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I note that the CI issue was complaining about multiple tests failing - is this because the reset() is associated with the test after the one that broke the cluster? If so, can we catch this in the right test in future by asserting in stopRandomDataNode that it isn't stopping the last-remaining shared master-eligible node (if autoManageMasterNodes == true at least)?
Also can we fix this while keeping a suite-wide cluster by choosing a node other than the unique shared master-eligible node in the offending test?
Yes.
Sweet idea, done in 4a59fc0, that does indeed fail a lot nicer :)
Sure, also done in 4a59fc0. Just fired up a new data only node for this. As far as I can see that doesn't change the test's behavior and makes things super safe without complicating things? |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the second round @original-brownbear.
|
thanks @DaveCTurner ! |
* Fix ShrinkIndexIT * Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164
* The assertion added in elastic#44214 is tripped by tests running dedicated test clusters per test needlessly.This breaks existing tests like the one in elastic#44245. * Closes elastic#44245
testShrinkThenSplitWithFailedNodestops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.@DaveCTurner can you take a look since you added that assertion in reset in 034c765? :)