Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent allocating shards to broken nodes #18417

Closed
ywelsch opened this issue May 17, 2016 · 5 comments
Closed

Prevent allocating shards to broken nodes #18417

ywelsch opened this issue May 17, 2016 · 5 comments
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team (obsolete)

Comments

@ywelsch
Copy link
Contributor

ywelsch commented May 17, 2016

Allocating shards to a node can fail for various reasons. When an allocation fails, we currently ignore the node for that shard during the next allocation round. However, this means that:

  • subsequent rounds consider the node for allocating the shard again.
  • other shards are still allocated to the node (in particular the balancer tries to put shards on that node with the failed shard as its weight becomes smaller).
    This is particularly bad if the node is permanently broken, leading to a never-ending series of failed allocations. Ultimately this affects the stability of the cluster.
@s1monw
Copy link
Contributor

s1monw commented May 17, 2016

@ywelsch I think we can approach this from multiple directions.

  • we can start bottom up and check that a data-path is writeable before we allocate a shard and skip it if possible (that would help if someone looses a disk and has multiple)
  • we can also has a simple allocation_failed counter on UnassignedInfo to prevent endless allocation of a potentially broken index (metadata / settings / whatever is broken)
  • we might also be able to use a simple counter of failed allocations per node that we can reset once we had a successful one on that node. We can then also have a simple allocation decider that throttles that node or takes it out of the loop entirely once the counter goes beyond a threshold?

I think in all of these cases simplicity wins over complex state... my $0.05

s1monw added a commit to s1monw/elasticsearch that referenced this issue May 19, 2016
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to elastic#18417
s1monw added a commit that referenced this issue May 20, 2016
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to #18417
@gmoskovicz
Copy link
Contributor

@ywelsch @s1monw is there are news on this?

Some OSs would cause the mounted disk to be read-only and if so the entire cluster will have issues with RED shards and not moving shards. Perhaps this could help in that end?

@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
@DaveCTurner DaveCTurner added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner removed the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Mar 15, 2018
@bleskes
Copy link
Contributor

bleskes commented Mar 21, 2018

We have another, non trivial, of instance of this in shard fetching. When it hard fails on a node (rather then succeeding by finding a broking copy) we currently redo the fetching. This is an easy way around networking issue but can be poisonous on disk failures (for example).

@idegtiarenko
Copy link
Contributor

We would rather remove the broken node from the cluster rather then take an fail allocation(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team (obsolete)
Projects
None yet
Development

No branches or pull requests

10 participants