-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for control plane node to be ready after joining the cluster #598
Conversation
@@ -211,6 +212,16 @@ func runKubeadmJoinControlPlane( | |||
return errors.Wrap(err, "failed to join a control plane node with kubeadm") | |||
} | |||
|
|||
// Wait for the node to be Ready | |||
// TODO: remove once https://github.com/kubernetes-sigs/kind/issues/588 is fixed | |||
// kubeadm join should guarantee that the cluster is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini @neolit123 I don't know if this is true 😅 , should kubeadm join
, guarantee that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea see my comments on the issue #588 (comment)
In a nutshell Kubeadm is not responsible, but it is the kubelet. Additional, it seems also that the API server does not detected properly when the etcd instance is ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubeadm join should guarantee that the cluster is ready
well kubeadm can look at pod and endpoint state, but the cluster as a whole - a bit tricky.
kinder currently waits for these pods + the node ready status, as @fabriziopandini mentioned:
https://github.com/kubernetes/kubeadm/blob/62556834c87e34004ac84c17b2f2c68b5c4f3b22/kinder/pkg/actions/waiter.go#L32-L44
given HA join consistently does not fail using kind 0.2.0 as seen here:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kind-master
i'm trying to get to the bottom of the problem instead - i.e. finding a change in kind that helped the problem surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind is much faster with the latest versions, the fact that adding delays solve the problem or at least reduce them makes me think that´s tightly related to that
/assign @neolit123 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Code looks fine FWIW but I'm not convinced we've fully root-caused the issue we're attempting to address yet, and I'm not a fan of adding arbitrary busy waits. |
Definitely not the right approach |
It can happen that the control plane node is not completely ready after joining the cluster,
If one worker node tries to join against a control plane node that's not ready, it fails to join thus the cluster creation fails.
This is a workaround to wait until the control node is ready after it joins the cluster before joining new nodes.
Fixes: #588