Gossip re-adds nodes replaced after ungraceful shutdown

Commit hash: `dcd618bcf93b60bdf19ca1b29c24c4a08d30615a`

(HEAD of `release/0.x`, behavior should be same on HEAD of `main` branch)

Context:

If a node in an actors cluster is killed ungracefully (say, via `SIGKILL`) and its replacement is immediately respawned then the previous node is still considered to be `up` since SWIM has not yet reached consensus on the previous node being `down` yet. In this scenario, a call to `Cluster.Membership.removeCompletely()` will be made [when processing the `.joining` MembershipChange](https://github.com/apple/swift-distributed-actors/blob/662e08172ca54e2a43da84381b5b07fe3fe6a070/Sources/DistributedActors/Cluster/Cluster%2BMembership.swift#L304) for the replacement node which will immediately remove it as a cluster member. However, if the cluster subsequently receives a gossip update that still includes the previous node as `.up` then the `Cluster.Membership.mergeFrom()` function [will create a MembershipChange directive](https://github.com/apple/swift-distributed-actors/blob/662e08172ca54e2a43da84381b5b07fe3fe6a070/Sources/DistributedActors/Cluster/Cluster%2BMembership.swift#L527) to add the node back.

Subsequently, the replacement node gets marked `.down` by the downing strategy (unclear exactly how this happens) while the previous node stays in the cluster membership marked as `.up`. This persists after additional restarts.

Steps to reproduce:

1. Run any actors cluster with 3 nodes using a static IP + port combination for each node
2. `kill -9` the process for one node
3. Immediately restart that node using the same IP + port


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gossip re-adds nodes replaced after ungraceful shutdown #866

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gossip re-adds nodes replaced after ungraceful shutdown #866

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions