Fix a bug when vm hosting master mongodb is down #80

xinsfang · 2018-04-11T10:35:10Z

Pod mongodb-0 (master) was on node master1. After master1 was shut down, mongodb-0's STATUS was in "Unknown", whereas pod.status.phase was "Running". mongodb-0 is repetitively selected as the cluster's master and removed from cluster as it is not reachable. It inhibited other cluster members to become a master. Sidecar needs to check correct field for pod status.
----logs---
$ kubectl logs -n=maglev-system mongodb-2 -c mongo-sidecar
...
Pod has been elected as a secondary to do primary work
Addresses to add: [ 'mongodb-0.mongodb.svc.cluster.local:27017' ]
...

$ kubectl get po mongodb-0 -n=maglev-system
NAME READY STATUS RESTARTS AGE
mongodb-0 3/3 Unknown 0 20h
$ kubectl get po mongodb-0 -n=maglev-system -oyaml | grep phase
phase: Running

Pod mongodb-0 (master) was on node master1. After master1 was shut down, mongodb-0's STATUS was in "Unknown", whereas pod.status.phase was "Running". mongodb-0 is repetitively selected as the cluster's master and removed from cluster as it is not reachable. It inhibited other cluster members to become a master. Sidecar needs to check correct field for pod status. ----logs--- kube@xinsfang-worker-1:~$ kubectl logs -n=maglev-system mongodb-2 -c mongo-sidecar ... Pod has been elected as a secondary to do primary work Addresses to add: [ 'mongodb-0.mongodb.maglev-system.svc.cluster.local:27017' ] ... kube@xinsfang-worker-1:~$ kubectl get po mongodb-0 -n=maglev-system NAME READY STATUS RESTARTS AGE mongodb-0 3/3 Unknown 0 20h kube@xinsfang-worker-1:~$ kubectl get po mongodb-0 -n=maglev-system -oyaml | grep phase phase: Running

bbbmj · 2018-06-04T02:04:11Z

LGTM
It's also what we meet.
cc @cvallance

abuzhynsky · 2019-04-03T12:34:59Z

@xinsfang @bbbmj Thanks for you PR!
But maybe it's better to filter out the pods in NodeLoststate in the beginning of the workLoop? And also existing master can be on the lost node so sidecar will be potentially unable to watch the pods changes.
I've created a new PR #98 with these changes, please take a look at it.
Thanks.

abuzhynsky mentioned this pull request Apr 3, 2019

Fix the NodeLost pod's status #98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug when vm hosting master mongodb is down #80

Fix a bug when vm hosting master mongodb is down #80

xinsfang commented Apr 11, 2018

bbbmj commented Jun 4, 2018

abuzhynsky commented Apr 3, 2019

Fix a bug when vm hosting master mongodb is down #80

Are you sure you want to change the base?

Fix a bug when vm hosting master mongodb is down #80

Conversation

xinsfang commented Apr 11, 2018

bbbmj commented Jun 4, 2018

abuzhynsky commented Apr 3, 2019