GatewayAllocator: reset rerouting flag after error #11519

bleskes · 2015-06-05T18:22:42Z

After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Example failure: http://build-us-00.elastic.co/job/es_core_1x_metal/9122/testReport/junit/org.elasticsearch.cluster/MinimumMasterNodesTests/multipleNodesShutdownNonMasterNodes/

After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. elastic#11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

s1monw · 2015-06-05T18:37:17Z

src/test/java/org/elasticsearch/cluster/MinimumMasterNodesTests.java

trace can go away too?

I didn't add it but yeah...

s1monw · 2015-06-05T18:37:23Z

LGTM

kimchy · 2015-06-05T18:42:42Z

nice catch @bleskes!, LGTM

After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again. Closes #11519

bleskes added >bug v2.0.0-beta1 review v1.6.0 labels Jun 5, 2015

s1monw added the blocker label Jun 5, 2015

s1monw reviewed Jun 5, 2015
View reviewed changes

bleskes closed this in 6aa27a1 Jun 5, 2015

kevinkluge removed the review label Jun 5, 2015

lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GatewayAllocator: reset rerouting flag after error #11519

GatewayAllocator: reset rerouting flag after error #11519

Uh oh!

bleskes commented Jun 5, 2015

Uh oh!

s1monw Jun 5, 2015

Uh oh!

bleskes Jun 5, 2015

Uh oh!

s1monw commented Jun 5, 2015

Uh oh!

kimchy commented Jun 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GatewayAllocator: reset rerouting flag after error #11519

GatewayAllocator: reset rerouting flag after error #11519

Uh oh!

Conversation

bleskes commented Jun 5, 2015

Uh oh!

s1monw Jun 5, 2015

Choose a reason for hiding this comment

Uh oh!

bleskes Jun 5, 2015

Choose a reason for hiding this comment

Uh oh!

s1monw commented Jun 5, 2015

Uh oh!

kimchy commented Jun 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants