Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Jun 5, 2015

After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Example failure: http://build-us-00.elastic.co/job/es_core_1x_metal/9122/testReport/junit/org.elasticsearch.cluster/MinimumMasterNodesTests/multipleNodesShutdownNonMasterNodes/

After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. elastic#11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace can go away too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it but yeah...

@s1monw
Copy link
Contributor

s1monw commented Jun 5, 2015

LGTM

@kimchy
Copy link
Member

kimchy commented Jun 5, 2015

nice catch @bleskes!, LGTM

bleskes added a commit that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via  a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
bleskes added a commit that referenced this pull request Jun 5, 2015
After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.

Closes #11519
@bleskes bleskes closed this in 6aa27a1 Jun 5, 2015
@kevinkluge kevinkluge removed the review label Jun 5, 2015
@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker >bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v1.6.0 v2.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants