This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Description
The following code snippet shows what happens when returning slaves of a cluster which could be used as OSC control replicas:
An issue was noticed that with 3 datacentres replication delay was noticed in one of them. The current code looks for 2 intermediate masters, busiest by number of replicas it has, and then looks for leaf nodes based from these servers. The problem if you have more than one datacentre or AZ is that this ignores other ones, and potentially latency and load on the cluster may be enough for you not to be monitoring replicas which may suffer from delay due to the ongoing OSC.
The proposed fix would be:
- find a minimum of at least 2 intermediate masters if possible, 1 per AZ/DC
- use intermediate masters if they are present and choose the intermediate master in each AZ/DC with the most number of lower level replicas
- this almost matches the existing logic but removes the hard limit of 2 intermediate masters to check and instead changes that to at least 1 per dc/az, with the minimum of 2 if possible
The change should be simple and this then better covers more complex topologies which span multiple locations preventing unwanted replication delay happening on the whole cluster.
And please understand that this issue may not be addressed immediately or in a timeframe you were expecting.
Yes. I'm aware that orchestrator is not being maintained at the moment, but think it's good to record issues for later if someone has time to fix them and to share with other users who may not be aware of the issue.