Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import org.elasticsearch.action.support.PlainActionFuture;
import org.elasticsearch.cluster.health.ClusterHealthStatus;
import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.cluster.routing.UnassignedInfo;
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.common.Priority;
import org.elasticsearch.common.settings.Settings;
Expand Down Expand Up @@ -308,21 +309,26 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS
}
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/62690")
public void testHealthOnMasterFailover() throws Exception {
final String node = internalCluster().startDataOnlyNode();
boolean withIndex = randomBoolean();
final boolean withIndex = randomBoolean();
if (withIndex) {
// Create index with many shards to provoke the health request to wait (for green) while master is being shut down.
// Notice that this is set to 0 after the test completed starting a number of health requests and master restarts.
// This ensures that the cluster is yellow when the health request is made, making the health request wait on the observer,
// triggering a call to observer.onClusterServiceClose when master is shutdown.
createIndex("test", Settings.builder().put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, randomIntBetween(0, 10)).build());
createIndex("test",
Settings.builder()
.put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, randomIntBetween(0, 10))
// avoid full recoveries of index, just wait for replica to reappear
.put(UnassignedInfo.INDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING.getKey(), "5m")
.build());
}
final List<ActionFuture<ClusterHealthResponse>> responseFutures = new ArrayList<>();
// Run a few health requests concurrent to master fail-overs against a data-node to make sure master failover is handled
// without exceptions
for (int i = 0; i < 20; ++i) {
final int iterations = withIndex ? 10 : 20;
for (int i = 0; i < iterations; ++i) {
responseFutures.add(client(node).admin().cluster().prepareHealth().setWaitForEvents(Priority.LANGUID)
.setWaitForGreenStatus().setMasterNodeTimeout(TimeValue.timeValueMinutes(1)).execute());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding: We took more than 1 minute here to get 20 (empty) shards to recover? Isn't that indicative of some other issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That time could include the 10 master restarts as well as some recovery time. The health call may not be able to respond until the settings are updated below.

internalCluster().restartNode(internalCluster().getMasterName(), InternalTestCluster.EMPTY_CALLBACK);
Expand Down