Skip to content

Remove node from cluster when node locks are broken #58373

@DaveCTurner

Description

@DaveCTurner

In #52680 we are introducing a mechanism that will allow nodes to remove themselves from the cluster if they locally determine themselves to be unhealthy. The only check today is that their data paths are all empirically writeable. We could also check NodeEnvironment#assertEnvIsLocked() here; indeed we already call this method during the health check but do not consider a failure to be fatal (see #52680 (comment)).

A broken node lock today blocks things like allocating new shards to the node, but I think it does not block indexing or searching on existing shards since these are protected by shard-level locks instead. On the other hand there's something very wrong with your environment if the node lock is broken and it seems reasonable to treat it pretty seriously.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions