Skip to content

Conversation

@original-brownbear
Copy link
Contributor

@DaveCTurner can you take a look since you added that assertion in reset in 034c765? :)

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v8.0.0 v7.4.0 labels Jul 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

I note that the CI issue was complaining about multiple tests failing - is this because the reset() is associated with the test after the one that broke the cluster? If so, can we catch this in the right test in future by asserting in stopRandomDataNode that it isn't stopping the last-remaining shared master-eligible node (if autoManageMasterNodes == true at least)?

Also can we fix this while keeping a suite-wide cluster by choosing a node other than the unique shared master-eligible node in the offending test?

@original-brownbear
Copy link
Contributor Author

@DaveCTurner

I note that the CI issue was complaining about multiple tests failing - is this because the reset() is associated with the test after the one that broke the cluster?

Yes.

If so, can we catch this in the right test in future by asserting in stopRandomDataNode that it isn't stopping the last-remaining shared master-eligible node (if autoManageMasterNodes == true at least)?

Sweet idea, done in 4a59fc0, that does indeed fail a lot nicer :)

Also can we fix this while keeping a suite-wide cluster by choosing a node other than the unique shared master-eligible node in the offending test?

Sure, also done in 4a59fc0. Just fired up a new data only node for this. As far as I can see that doesn't change the test's behavior and makes things super safe without complicating things?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the second round @original-brownbear.

@original-brownbear
Copy link
Contributor Author

thanks @DaveCTurner !

@original-brownbear original-brownbear merged commit a052067 into elastic:master Jul 11, 2019
@original-brownbear original-brownbear deleted the 44164 branch July 11, 2019 13:49
original-brownbear added a commit that referenced this pull request Jul 11, 2019
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
pull bot pushed a commit to sadlil/elasticsearch that referenced this pull request Jul 12, 2019
* The assertion added in elastic#44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in elastic#44245.
* Closes elastic#44245
original-brownbear added a commit that referenced this pull request Jul 12, 2019
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes #44245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test Issues or PRs that are addressing/adding tests v7.4.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] Various tests in ShrinkIndexIT fail with "expected at least one master-eligible node left"

4 participants