Skip to content

Conversation

@budde
Copy link
Contributor

@budde budde commented Feb 2, 2022

Make sure a previous cluster member is marked as .down before replacing it

Motivation:

Modifications:

Result:

@budde
Copy link
Contributor Author

budde commented Feb 2, 2022

Also have a down-replaced-cluster-members-0.x in my fork against the release/0.x branch

@ktoso
Copy link
Member

ktoso commented Feb 2, 2022

Whoa that's a nasty compiler crash there... Thank you for looking into this I will have a look!

I suggest we focus on the 0.x branch with this fix as the main branch remains in flux for a while still (with the language being in flight and all). So the failures are none of our own really but compiler issues like this one...

@ktoso ktoso self-requested a review February 2, 2022 06:34
@ktoso
Copy link
Member

ktoso commented Feb 2, 2022

Change looks good to me! I'll reproduce and play around a bit more tomorrow -- can you submit the PR to 0.x and let's leave main be for the time being? I'll forward port the patch later then 👍

@budde budde changed the base branch from main to release/0.x February 2, 2022 06:36
@budde budde force-pushed the down-replaced-cluster-members branch from 2a1cd0b to 3da8349 Compare February 2, 2022 06:37
@ktoso
Copy link
Member

ktoso commented Feb 2, 2022

Minor typo in the integration test :)

16:29:43 /code/IntegrationTests/tests_04_cluster/test_02_ungraceful_shutdown_recovery.sh: line 31: 
thrid_logs: unbound variable

@ktoso ktoso marked this pull request as ready for review February 2, 2022 08:39
@budde budde changed the title [DRAFT] Down replaced cluster members Down replaced cluster members Feb 2, 2022
@ktoso
Copy link
Member

ktoso commented Feb 3, 2022

Hmmm the integration test is not aggressive enough perhaps since it does not reproduce the issue but I think the fix looks
right...

@budde
Copy link
Contributor Author

budde commented Feb 3, 2022

Good catch, was making a few silly assumptions in the integ test. Think the typo you caught might've been contributing to why I was getting it to fail initially ;)

Updated the test. Should fail deterministically without the fix.

@ktoso
Copy link
Member

ktoso commented Feb 4, 2022

Great, reproduced the failure and confirmed the fix -- looking good, thank you a lot @budde !

I'll cut a release for this as well :)

@ktoso ktoso merged commit 4b6768b into apple:release/0.x Feb 4, 2022
ktoso added a commit to ktoso/swift-distributed-actors that referenced this pull request May 31, 2022
* Make sure replaced cluster members are .down

* Add integration test for ungraceful shutdown

* Remove dead code from integ test

* Remove more dead code

* Update MembershipTests for changed replacement semantics

* remove whitespace that made it in

* fix typo in test

* Fix issues in integration test

* Fix formatting

Co-authored-by: Konrad `ktoso` Malawski <[email protected]>
ktoso added a commit that referenced this pull request Jun 1, 2022
* Make sure replaced cluster members are .down

* Add integration test for ungraceful shutdown

* Remove dead code from integ test

* Remove more dead code

* Update MembershipTests for changed replacement semantics

* remove whitespace that made it in

* fix typo in test

* Fix issues in integration test

* Fix formatting

Co-authored-by: Konrad `ktoso` Malawski <[email protected]>

Co-authored-by: Adam Budde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gossip re-adds nodes replaced after ungraceful shutdown

2 participants