Akka.Cluster.Tools.Singleton / Akka.Cluster.Sharding: fix duplicate shards caused by incorrect ClusterSingletonManager HandOver#7297
Conversation
Eliminates the source of akkadotnet#6793, which was caused by using the incorrect ordering methodology when it came to determining which `ClusterSingletonManager` to hand-over to during member state transitions. close akkadotnet#6973 close akkadotnet#7196
Aaronontheweb
left a comment
There was a problem hiding this comment.
This resolves one of the nastiest bugs we've ever debugged - I knew it would be a stupid problem like this, but tracking it down was a herculean effort. Hundreds of hours spent between @Arkatufus and myself.
| _memberAgeComparer = settings.ConsiderAppVersion | ||
| ? MemberAgeOrdering.DescendingWithAppVersion | ||
| : MemberAgeOrdering.Descending; | ||
| _memberAgeComparer = Member.AgeOrdering; |
There was a problem hiding this comment.
This is the bug fix - descending order of members by age means we always get THE YOUNGEST at the front of the membersByAge list. Members MUST BE IN ASCENDING ORDER, OLDEST FIRST. I'm not sure how far back this bug goes - at least to v1.5. I'll add another comment after I've dug that up.
There was a problem hiding this comment.
No longer needed, sticking with DRY especially since Member.AgeOrdering is already thoroughly sanity checked via #7291
| _memberAgeComparer = considerAppVersion | ||
| ? MemberAgeOrdering.DescendingWithAppVersion | ||
| : MemberAgeOrdering.Descending; | ||
| _memberAgeComparer = Member.AgeOrdering; |
There was a problem hiding this comment.
Uses the same fix as the SingletonProxy
| else | ||
| { | ||
| Unhandled(message); | ||
| case ClusterEvent.CurrentClusterState state: |
There was a problem hiding this comment.
ReSharper'd this to use a switch statement. No biggie.
| { | ||
| } | ||
|
|
||
| protected TestException(SerializationInfo info, StreamingContext context) |
There was a problem hiding this comment.
Obsolete, deleted it to remove a build warning.
|
Going to put this into our test lab with a private build and verify the fix overnight. |
|
Looked at the history - apparently this has ALWAYS been a bug with Akka.Cluster.Tools, going all the way back to when the Cluster Singeton was first introduced #1530 |
|
We'll see if tests and the MNTR need to be adjusted, but our tracing system found definitive proof that the hand off was being screwed up, which lead to two |
Changes
Eliminates the source of #6793, which was caused by using the incorrect ordering methodology when it came to determining which
ClusterSingletonManagerto hand-over to during member state transitions.close #6973
close #7196
Supersedes #7287 and #7197
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):
AppVersionjoins cluster #7197