Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing k8s ha configuration by shutting down the first k8s master node #6

Closed
kcao3 opened this issue Oct 24, 2017 · 4 comments
Closed

Comments

@kcao3
Copy link

kcao3 commented Oct 24, 2017

@cookeen, I followed your provided instruction and was able to deploy a HA Kubernetes cluster (with 3 k8s master nodes and 2 k8s nodes) using Kubernetes version 1.8.1
Everything seems working just like you described in instruction.

Next, I focused on testing the high availablity configuration. To do so, I attempted to shutdown the first k8s master. Once the first k8s master is brought down, the keepalived service on this node stopped and the virtual IP address transferred to the second k8s master. However, things start falling apart :(

Specifically, on the second (or third) master, when running the command: 'kubectl get nodes', the output shows something like the following:

NAME STATUS ROLES ...
k8s-master1 NotReady master ...
k8s-master2 Ready ...
k8s-master3 Ready ...
k8s-node1 Ready ...
k8s-node2 Ready ...

Also, on k8s-master2 or k8s-master3, when I ran 'kubectl logs' to check controller-manager and
scheduler, it appeared they did NOT reelect a new leader. As a result, all of the kubernetes services that were exposed before were no longer accessible.

Do you have any idea why the reelection process did NOT occur for the controller-manager and
scheduler on the remaining k8s master nodes?

@cookeem
Copy link
Owner

cookeem commented Oct 25, 2017

I didn't test version 1.8.x yet.
On master1, master2, master3 edit kube-apiserver.yaml, kubelet.conf, admin.conf, controller-manager.conf, scheduler.conf files' server config to current host ip address, check it works or not.

If it works, I think the problem is keepalived, check the keepalived's log.

if it does not work, check the kubelet's log.

Then show me the log, pls.

@kcao3
Copy link
Author

kcao3 commented Oct 25, 2017

It turns out the reelection process for the controller-manager and scheduler running on the k8s master nodes worked just fine. Keepalived was working just fine as well.

The root cause of the problem is the setting in the configmap 'cluster-info' in the kube-public namespace. So, in addition to the configmap 'kube-proxy' in the kube-system namespace, I had to edit the 'cluster-info' configmap, replaced the host ip address:6443 with the virtual IP address:8443. This is extremely important for any new worker node to bootstrap with the correct configuration setting when joining the cluster, using kubeadm join. For my 2 existing k8s nodes, I just manually updated the /etc/kubernetes/kubelet.conf, restarted the docker and kubelet service on these nodes and everything works as expected :)

Thank you so much for your prompt response.

@cookeem
Copy link
Owner

cookeem commented Oct 26, 2017

In my instruction there's "kube-proxy configuration", did you say this config?

$ kubectl edit -n kube-system configmap/kube-proxy
        server: https://192.168.60.80:8443

@discordianfish
Copy link

Keep in mind that on next kubeadm init it will override the kube-proxy configmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants